Search code examples

C# Binary Trees and Dictionaries

I'm struggling with the concept of when to use binary search trees and when to use dictionaries.

In my application I did a little experiment which used the C5 library TreeDictionary (which I believe is a red-black binary search tree), and the C# dictionary. The dictionary was always faster at add/find operations and also always used less memory space. For example, at 16809 <int, float> entries, the dictionary used 342 KiB whilst the tree used 723 KiB.

I thought that BST's were supposed to be more memory efficient, but it seems that one node of the tree requires more bytes than one entry in a dictionary. What gives? Is there a point at where BST's are better than dictionaries?

Also, as a side question, does anyone know if there is a faster + more memory efficient data structure for storing <int, float> pairs for dictionary type access than either of the mentioned structures?


  • I thought that BST's were supposed to be more memory efficient, but it seems that one node of the tree requires more bytes than one entry in a dictionary. What gives? Is there a point at where BST's are better than dictionaries?

    I've personally never heard of such a principle. Even still, its only a general principle, not a categorical fact etched in the fabric of the universe.

    Generally, Dictionaries are really just a fancy wrapper around an array of linked lists. You insert into the dictionary something like:

    LinkedList<Tuple<TKey, TValue>> list =
        internalArray[internalArray % key.GetHashCode()];
    if (list.Exists(x => x.Key == key))
        throw new Exception("Key already exists");
    list.AddLast(Tuple.Create(key, value));

    So its nearly O(1) operation. The dictionary uses O(internalArray.Length + n) memory, where n is number of items in the collection.

    In general BSTs can be implemented as:

    • linked-lists, which use O(n) space, where n is the number items in the collection.
    • arrays, which use O(2h - n) space where h is the height of the tree and n is the number of items in the collection.
      • Since red-black trees have a bounded height of O(1.44 * n), an array implementation should have a bounded memory usage of about O(21.44n - n)

    Odds are, the C5 TreeDictionary is implemented using arrays, which is probably responsible for the wasted space.

    What gives? Is there a point at where BST's are better than dictionaries?

    Dictionaries have some undesirable properties:

    • There may not be enough continugous blocks of memory to hold your dictionary, even if its memory requirements are much less than than the total available RAM.

    • Evaluating the hash function can take an arbitrarily long length of time. Strings, for example, use Reflector to examine the System.String.GetHashCode method -- you'll notice hashing a string always takes O(n) time, which means it can take considerable time for very long strings. On the hand, comparing strings for inequality almost always faster than hashing, since it may require looking at just the first few chars. Its wholly possible for tree inserts to be faster than dictionary inserts if hash code evaluation takes too long.

      • Int32's GetHashCode method is literally just return this, so you'd be hardpressed to find a case where a hashtable with int keys is slower than a tree dictionary.

    RB Trees have some desirable properties:

    • You can find/remove the Min and Max elements in O(log n) time, compared to O(n) time using a dictionary.

    • If a tree is implemented as linked list rather than an array, the tree is usually more space efficient than a dictionary.

    • Likewise, its ridiculous easy to write immutable versions of trees which support insert/lookup/delete in O(log n) time. Dictionaries do not adapt well to immutability, since you need to copy the entire internal array for every operation (actually, I have seen some array-based implementations of immutable finger trees, a kind of general purpose dictionary data structure, but the implementation is very complex).

    • You can traverse all the elements in a tree in sorted order in constant space and O(n) time, whereas you'd need to dump a hash table into an array and sort it to get the same effect.

    So, the choice of data structure really depends on what properties you need. If you just want an unordered bag and can guarantee that your hash function evaluate quickly, go with a .Net Dictionary. If you need an ordered bag or have a slow running hash function, go with TreeDictionary.