Search code examples
data-structureshashhashtablesparsehash

What is the main implementation idea behind sparse hash table?


Why does Google sparsehash open-source library has two implementations: a dense hashtable and a sparse one?


Solution

  • The dense hashtable is your ordinary textbook hashtable implementation.

    The sparse hashtable stores only the elements that have actually been set, divided over a number of arrays. To quote from the comments in the implementation of sparse tables:

    // The idea is that a table with (logically) t buckets is divided
    // into t/M *groups* of M buckets each.  (M is a constant set in
    // GROUP_SIZE for efficiency.)  Each group is stored sparsely.
    // Thus, inserting into the table causes some array to grow, which is
    // slow but still constant time.  Lookup involves doing a
    // logical-position-to-sparse-position lookup, which is also slow but
    // constant time.  The larger M is, the slower these operations are
    // but the less overhead (slightly).
    

    To know which elements of the arrays are set, a sparse table includes a bitmap:

    // To store the sparse array, we store a bitmap B, where B[i] = 1 iff
    // bucket i is non-empty.  Then to look up bucket i we really look up
    // array[# of 1s before i in B].  This is constant time for fixed M.
    

    so that each element incurs an overhead of only 1 bit (in the limit).