Search code examples
cperformancedata-structurestreetrie

Trie implementation efficiency


Which is a more efficient. A Trie structure like this:

struct TrieNode              
{
char letter;              
bool isWord;                
     TrieNode* subNodes[26]; 
};

or a Trie structure like this:

struct TrieNode
{ 
    char letter;
    bool isword;
    map<int, TrieNode*> subNodes;
};

Or is there just a better implementation altogether... Also, could someone give me an explanation?


Solution

  • I would use the first, for simplicity and speed, but conceivably the second could save space.

    You don't need the char letter element in either code. It's redundant, because the way you look up a word is to take a letter of the key and use it either as an index into the subNode array, or a key into your map, so as to choose a subNode. Either way, you never need to look at letter.

    The way you know if the word is not in the trie is if you hit a null subnode, or if you exhaust your key without hitting a isWord subnode.

    BTW, if your trie doesn't contain too many words, and if it does not change very often, you will always save about an order of magnitude of speed by transforming it into ad-hoc code.


    EDIT What I mean by ad-hoc code is that a trie is a kind of finite-state-machine, and a finite-state-machine is a kind of program. So you write a program to read the sorted dictionary, but instead of building a trie data structure, it writes out a program in your favorite language that looks like this:

    // XYZ is the prefix string that corresponds to a node in the trie
    bool XYZFunc(char* key){
        switch (*key){
        case '\0': return true /* if XYZ is a valid word, else false */; break;
        case 'a': return XYZaFunc(key+1); break;
        case 'b': return XYZbFunc(key+1); break;
        // etc. etc.
        }
    }
    

    This could be a lot of functions, but within reason the compiler should be able to handle it. Then to look up a word you just call the top-level function, and it returns true or false. At each node, the compiler will determine if it needs a jump table or not, so you don't have to worry about that.