Search code examples
memory-managementdata-structurestrie

How can a compact trie be beneficial?


I just learned that a compact trie can save more memory compared to a regular trie, by storing 3 integers as a reference to a string object and instead of actually storing a string in each node. However, I'm still confused about how it can actually save memory using that method.

If in a node of a compact trie, we store 3 integers and a reference to a String object, would that not save any memory at all if that String object is also stored in memory?

If this is the case, is the compact trie only beneficial when we store the String object on disk.


Solution

  • The compact storage of a compressed trie can be more space efficient if you're already storing the strings for some other purpose.

    The compact and non-compact versions will have similar memory usage if you're also counting the storing of the strings. The compact version might even be worse, depending on how many bytes your integers and pointers require.

    For a non-compact compressed trie:

    • Each node would have a string which requires a pointer (let's say 4 bytes) and a length (2 bytes). This gives us 6 bytes.

    • On top of this we've need to store the actual string (even if we're already storing these elsewhere).

    For a compact compressed trie:

    • Each node would have 3 integers (2 bytes each). This gives us 6 bytes.
    • If we're counting storing the strings as well, we'd have the actual strings (which is the same as the above), in addition to the overhead (pointer and length) of those strings.
    • Given these numbers (if we're counting storing the strings), this version would be worse.

    For a non-compressed trie: (that is, where we only store one character per node)

    • Each node directly stores its one character.
    • There would be no string overhead.
    • However, if you have long chains, you could have many more nodes with this representation, thus this could end up being less time and space efficient (especially considering the time cost of having to jump between a bunch of locations in memory instead of just being able to read a sequential block of memory).