Search code examples
performancefinite-automatamemory-efficient

Memory efficient way to store DFA transition table with large alphabet?


I am trying to implement a very simple state transition table for a DFA, but I'm running into some efficiency problems.

If I arrange a transition table with each row as the source state, each column as a possible input, and the intersection as the next state, then transitioning is as simple as:

next_state = table[from_state, input]

This method will give O(1) transition time. The problem arises when there are lots of possible inputs. If I have an alphabet of say all 128 ASCII characters, then the table will grow out of control and chew through memory for even just a few states (states * inputs).

Another option I considered was to make each column the destination state, and the intersections the input symbol.

input = table[from_state, to_state]

This approach cuts down on memory consumption but performing a transition would require testing all of the columns until one is found with the correct input in O(N) fashion.

Am I really stuck in a time-space tradeoff, or is there some sort of better way to implement this?

P.S. I am aware that compression algorithms exist, but can they be used to keep the table compressed during it's construction?


Solution

  • I think I've decided on an implementation that seems to fit my needs. It uses the same method as the first approach but keeps a separate tuple that tracks the smallest and largest used codepoints. The table will expand its columns only to the number of characters in this range. Since most characters probably won't ever be used except for a tight range of characters, this is probably sufficient.

    Indexing then just becomes (input - lowest_codepoint) starting from index 0.