Search code examples
data-structurestrie

Can a crit-bit tree store a non-prefix-free set of strings space-efficiently?


The standard description of a crit-bit tree shows how to implement one using only three machine words per node: length, left pointer, right pointer. However, this encoding is only valid for storing a set of prefix-free strings; it is not able to encode sets like {"a", "aa", "aaa"}. Is there a variant that allows storing any set of strings while keeping the nodes small?

Obviously any set of strings can be made prefix-free by padding them out to the same length, but that wastes space unless it can be done implicitly, and the length of the longest string may not be known in advance.


Solution

  • A better description of a crit-bit tree can be found here:http://code.google.com/p/radixtree/. IMO a crit bit tree is not limited to a prefix free set of strings. You can try this function

    unsigned int bit(size_t pos, unsigned char const* k, size_t klen) {
      if (pos/(CHAR_BIT+1)>=klen) return 0;
      if (pos%(CHAR_BIT+1)==0) return 1;
      return (((unsigned int)k[pos/(CHAR_BIT+1)])>>(CHAR_BIT-pos%(CHAR_BIT+1)))&(unsigned int)1;
    }
    

    It's from the kart tree:http://code.dogmap.org/kart/