Search code examples
javastringalgorithmstring-matchingfinite-automata

Finite Automata String Matcher


I am trying to build a FA string matcher using java. I have the following pseudocode.

enter image description here

For the Finite-Automata-Matcher algorithm to work the transition function has to be computed. The following algorithm Compute-Transition-Function computes given a the pattern P and the alphabet sigma.

enter image description here

In the above code I couldn't understand where did min(m + 1, q + 2) come from. (I did understand why it is k = min(m + 1, q + 2) instead of k = min(m, q + 1) but why we want the minimum of m and q+1 in the first place)

In between lines at 5-7 it decreases k by one until Pk is a suffix of Pqa, but I couldn't understand what Pqa stands for.

Also, how can I convert the line 8 to a java code? Would a two-dimensional array be sufficient or do I need another data structure.

A related question: string matching with finite automata


Solution

  • On the internal repeat-until loop say we have Pq = 'abdab' and string is 'abdabcd', and our alphabet is abcd, and we are looking for best alternative for every symbol from alphabet, and then store transition to the new state. In case above, by 'a', we should move to the beginning, 'b' to the very beginning, c prolongates match, and d symbol should store pointer to the third symbol in our initial string. So Pqa should be read as Pq plus character a from alphabet.

    Edit why we want min of (q+2 and m+1), because we would like to perform one step forward, and we also would like to limit length of string, which is obvious. Why cannot we perform q+3, +4? It's because we are adding just one character, and it's not possible to extend best match from q to q+2,+3, by just a single character.