I have heard many good things about OpenFST, yet I struggle with making it work. I am constructing an FST automaton (fstcompile) that I want to use as an acceptor to check if a set of strings are matching (very much alike regular expressions but with the advantages provided by optimizations of the automatons provided by OpenFST). And here is the thing:
How to check if the resulting automaton accepts a string?
I found a suggestion that the input string shall be turned into a simple automaton and composed with the accepting automaton to get a result. I found it highly cumbersome and strange. Is there an easier way (either via cmd line or Python/C++)?
Here's a quick example on how you can test whether an automaton accepts a string using Open FST's Python wrapper. Indeed, you have to turn your input into an automaton, and Open FST doesn't even create this "linear chain automata" for you! Fortunately, it's simple to automate this process as seen below:
def linear_fst(elements, automata_op, keep_isymbols=True, **kwargs):
"""Produce a linear automata."""
compiler = fst.Compiler(isymbols=automata_op.input_symbols().copy(),
acceptor=keep_isymbols,
keep_isymbols=keep_isymbols,
**kwargs)
for i, el in enumerate(elements):
print >> compiler, "{} {} {}".format(i, i+1, el)
print >> compiler, str(i+1)
return compiler.compile()
def apply_fst(elements, automata_op, is_project=True, **kwargs):
"""Compose a linear automata generated from `elements` with `automata_op`.
Args:
elements (list): ordered list of edge symbols for a linear automata.
automata_op (Fst): automata that will be applied.
is_project (bool, optional): whether to keep only the output labels.
kwargs:
Additional arguments to the compiler of the linear automata .
"""
linear_automata = linear_fst(elements, automata_op, **kwargs)
out = fst.compose(linear_automata, automata_op)
if is_project:
out.project(project_output=True)
return out
def accepted(output_apply):
"""Given the output of `apply_fst` for acceptor, return True is sting was accepted."""
return output_apply.num_states() != 0
Let's define a simple Acceptor that only accepts series of "ab":
f_ST = fst.SymbolTable()
f_ST.add_symbol("<eps>", 0)
f_ST.add_symbol("a", 1)
f_ST.add_symbol("b", 2)
compiler = fst.Compiler(isymbols=f_ST, osymbols=f_ST, keep_isymbols=True, keep_osymbols=True, acceptor=True)
print >> compiler, "0 1 a"
print >> compiler, "1 2 b"
print >> compiler, "2 0 <eps>"
print >> compiler, "2"
fsa_abs = compiler.compile()
fsa_abs
Now we can simply apply the Acceptor using :
accepted(apply_fst(list("abab"), fsa_abs))
# True
accepted(apply_fst(list("ba"), fsa_abs))
# False
To see how to use the transducer look at my other answer