morfologik.fsa
Class FSA

java.lang.Object
  extended by morfologik.fsa.FSA
All Implemented Interfaces:
java.lang.Iterable<java.nio.ByteBuffer>
Direct Known Subclasses:
FSAVer5Impl

public abstract class FSA
extends java.lang.Object
implements java.lang.Iterable<java.nio.ByteBuffer>

This class implements Finite State Automaton traversal as described in Jan Daciuk's Incremental Construction of Finite-State Automata and Transducers, and Their Use in the Natural Language Processing (PhD thesis, Technical University of Gdansk).

This is an abstract base class for all forms of binary storage present in Jan Daciuk's FSA package.


Field Summary
protected  byte filler
          The meaning of this field is not clear (check the FSA documentation).
protected  byte gotoLength
          Size of transition's destination node "address".
protected  byte version
          Dictionary version (derived from the combination of flags).
static byte VERSION_5
          Version number for version 5 of the automaton.
 
Constructor Summary
protected FSA(java.io.InputStream fsaStream, java.lang.String dictionaryEncoding)
          Creates a new automaton reading the FSA automaton from an input stream.
 
Method Summary
 char getAnnotationSeparator()
          Return the annotation separator character, converted to a character according to the encoding scheme passed in in the constructor of this class.
abstract  int getArc(int node, byte label)
          Returns the identifier of an arc leaving node and labeled with label.
abstract  byte getArcLabel(int arc)
          Return the label associated with a given arc.
abstract  int getEndNode(int arc)
          Return the end node pointed to by a given arc.
 char getFillerCharacter()
          Return the filler character, converted to a character according to the encoding scheme passed in in the constructor of this class.
abstract  int getFirstArc(int node)
          Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.
 int getFlags()
          Returns a set of flags for this FSA instance.
static FSA getInstance(java.io.File fsaFile, java.lang.String dictionaryEncoding)
          This static method will attempt to instantiate an appropriate implementation of the FSA for the version found in file given in the input argument.
static FSA getInstance(java.io.InputStream fsaStream, java.lang.String dictionaryEncoding)
          This static method will attempt to instantiate an appropriate implementation of the FSA for the version found in file given in the input argument.
abstract  int getNextArc(int node, int arc)
          Returns the identifier of the next arc after arc and leaving node.
abstract  int getNumberOfArcs()
          Returns the number of arcs in this automaton.
abstract  int getNumberOfNodes()
          Returns the number of nodes in this automaton.
abstract  int getRootNode()
          Returns the identifier of the root node of this automaton.
 FSATraversalHelper getTraversalHelper()
          Returns an object which can be used to walk the edges of this finite state automaton and match arbitrary sequences against its states.
 int getVersion()
          Returns the version number of the binary representation of this FSA.
abstract  boolean isArcFinal(int arc)
          Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.
abstract  boolean isArcTerminal(int arc)
          Returns true if this arc does not have a terminating node.
 java.util.Iterator<java.nio.ByteBuffer> iterator()
          Returns an iterator over all binary sequences starting from the initial FSA state and ending in final nodes.
protected  byte[] readFully(java.io.InputStream stream)
          Reads all bytes from an input stream.
protected  void readHeader(java.io.DataInput in, long fileSize)
          Reads a FSA header from a stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

VERSION_5

public static final byte VERSION_5
Version number for version 5 of the automaton.

See Also:
Constant Field Values

version

protected byte version
Dictionary version (derived from the combination of flags).


filler

protected byte filler
The meaning of this field is not clear (check the FSA documentation).


gotoLength

protected byte gotoLength
Size of transition's destination node "address". This field may also have different interpretation, or may not be used at all. It depends on the combination of flags used for building FSA.

Constructor Detail

FSA

protected FSA(java.io.InputStream fsaStream,
              java.lang.String dictionaryEncoding)
       throws java.io.IOException
Creates a new automaton reading the FSA automaton from an input stream.

Parameters:
fsaStream - An input stream with FSA automaton.
Throws:
java.io.IOException - if the dictionary file cannot be read, or version of the file is not supported.
Method Detail

getVersion

public final int getVersion()
Returns the version number of the binary representation of this FSA.

The version number is a derivation of combination of flags and is exactly the same as in Jan Daciuk's FSA package.


getFlags

public final int getFlags()
Returns a set of flags for this FSA instance. Each flag is represented by a unique bit in the integer returned. Therefore to check whether the dictionary has been built using FSAFlags.FLEXIBLE flag, one must perform a bitwise AND: boolean isFlexible = ((dict.getFlags() & FSA.FSA_FLEXIBLE ) != 0)


getAnnotationSeparator

public final char getAnnotationSeparator()
Return the annotation separator character, converted to a character according to the encoding scheme passed in in the constructor of this class.


getFillerCharacter

public final char getFillerCharacter()
Return the filler character, converted to a character according to the encoding scheme passed in in the constructor of this class.


getNumberOfArcs

public abstract int getNumberOfArcs()
Returns the number of arcs in this automaton. Depending on the representation of the automaton, this method may take a long time to finish.


getNumberOfNodes

public abstract int getNumberOfNodes()
Returns the number of nodes in this automaton. Depending on the representation of the automaton, this method may take a long time to finish.


getTraversalHelper

public FSATraversalHelper getTraversalHelper()
Returns an object which can be used to walk the edges of this finite state automaton and match arbitrary sequences against its states.


getInstance

public static FSA getInstance(java.io.File fsaFile,
                              java.lang.String dictionaryEncoding)
                       throws java.io.IOException
This static method will attempt to instantiate an appropriate implementation of the FSA for the version found in file given in the input argument.

Throws:
java.io.IOException - An exception is thrown if no corresponding FSA parser is found or if the input file cannot be opened.

getInstance

public static FSA getInstance(java.io.InputStream fsaStream,
                              java.lang.String dictionaryEncoding)
                       throws java.io.IOException
This static method will attempt to instantiate an appropriate implementation of the FSA for the version found in file given in the input argument.

Throws:
java.io.IOException - An exception is thrown if no corresponding FSA parser is found or if the input file cannot be opened.

readHeader

protected void readHeader(java.io.DataInput in,
                          long fileSize)
                   throws java.io.IOException
Reads a FSA header from a stream.

Throws:
java.io.IOException - If the stream is not a dictionary, or if the version is not supported.

readFully

protected byte[] readFully(java.io.InputStream stream)
                    throws java.io.IOException
Reads all bytes from an input stream.

Parameters:
stream -
Returns:
Returns an array of read bytes.
Throws:
java.io.IOException

iterator

public java.util.Iterator<java.nio.ByteBuffer> iterator()
Returns an iterator over all binary sequences starting from the initial FSA state and ending in final nodes. The returned iterator is a ByteBuffer that changes on each call to Iterator.next(), so if the content should be preserved, it must be copied somewhere else.

It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.

Specified by:
iterator in interface java.lang.Iterable<java.nio.ByteBuffer>

getRootNode

public abstract int getRootNode()
Returns the identifier of the root node of this automaton. May return 0 if the start node is also the end node.

See Also:
getTraversalHelper()

getFirstArc

public abstract int getFirstArc(int node)
Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.

See Also:
getTraversalHelper()

getArc

public abstract int getArc(int node,
                           byte label)
Returns the identifier of an arc leaving node and labeled with label. An identifier equal to 0 means the node has no outgoing arc labeled label.

See Also:
getTraversalHelper()

getNextArc

public abstract int getNextArc(int node,
                               int arc)
Returns the identifier of the next arc after arc and leaving node. Zero is returned if no more arcs are available for the node.

See Also:
getTraversalHelper()

getEndNode

public abstract int getEndNode(int arc)
Return the end node pointed to by a given arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.

See Also:
getTraversalHelper()

getArcLabel

public abstract byte getArcLabel(int arc)
Return the label associated with a given arc.


isArcFinal

public abstract boolean isArcFinal(int arc)
Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.


isArcTerminal

public abstract boolean isArcTerminal(int arc)
Returns true if this arc does not have a terminating node.