Search code examples
cstring-matchingstate-machinealphabet

Limiting the alphabet for finite state automata string matching


Hi I have programmed my finite state automata string matching algorithm. However I am struggling to limit the alphabet to only two characters. My implementation looks similar to http://www.sanfoundry.com/cpp-program-perform-finite-state-automaton-based-search/.

The NO_OF_CHAR variable is stating the alphabet of the program. I am trying to limit this to only two characters {0,1} eg: 0101001. If anyone has knowledge of finite state automata, input would be appreciated.


Solution

  • An answer was already accepted but I post this based on the OP insistence in an earlier question on this topic, that there must be only 2 possibilities.

    int TF[][NO_OF_CHARS] is an array originally sized by #define NO_OF_CHARS 256. So in the example all possible unsigned char values can index it. When you try to reduce the number of chars to 2 you can only index this array by 0 or 1, but if your '0' and '1' in the cellular string are ASCII values, they will break the array.

    Based on that this line (and possibly others) are foxing the array

    state = TF[state][txt[i]];
    

    Notice that with characters '0' and '1' the array will be indexed by 48 and 49. What you need to do here, and possibly elsewhere, is

    state = TF[state][txt[i] & 1];
    

    Also watch out whether there are any places where this index of 0 or 1 is turned back into a char. If so you will need to add '0' to the array index.