Search code examples
haskellhashmaphashtableasymptotic-complexityhackage

Does a useful Haskell HashMap/HashTable/Dictionary library exist?


I'm looking for a monad-free, constant access query O(1) associative array.

Consider the hypothetical type:

data HT k v = ???

I want to construct an immutable structure once:

fromList :: Foldable t, Hashable k => t (k,v) -> HT k v

I want to subsequently query it repeatedly with constant time access::

lookup :: Hashable k => HT k v -> k -> Maybe v

There appears to be two candidate libraries which fall short:

unordered-containers

unordered-containers contains both strict and lazy variants of the type HashMap. Both HashMaps have O(log n) queries as documented by the lookup function. This query access time appears to be due to the construction of the HashMap types, which have an internal tree structure allowing for O(log n) insert functionality. An understandable design trade off for many use-cases, but since I don't need a mutable HashMap this tradeoff hampers my use-case.

hashtables

hashtables contains a HashTable type-class and three instance types with varying table constructions strategies. This library's type-class defines a constant time O(1) lookup function definition, but it is eternally embedded in the ST monad. There is no way to "freeze" the stateful HashTable implementations and have a lookup function that is not embedded of a stateful monad. The library's type-class interface is well designed when the entire computation is wrapped in a state monad, but this design is unsuitable for my use-case.


Does there exist some other library which defines types and functions which can construct an immutable constant access query O(1) associative array that is not embedded in a stateful monad?

Does there exist some way to wrap or modify these existing hashing-based libraries to produce an immutable constant access query O(1) associative array that is not embedded in a stateful monad?


Solution

  • The library you want is… unordered-containers. Or just plain old Data.Map from containers, if you’d prefer.

    The note in the unordered-containers documentation explains why you shouldn’t worry about the O(log n) time complexity for lookups:

    Many operations have a average-case complexity of O(log n). The implementation uses a large base (i.e. 16) so in practice these operations are constant time.

    This is a common practice with certain kinds of functional data structures because it allows good sharing properties while also having good time complexities. log16 still produces very small numbers even for very large n, so you can almost always treat those complexities as “effectively constant time”.

    If this is ever a bottleneck for your application, sure, go with something else, but I find that highly unlikely. After all, log16(1,000,000) is a little under 5, so your lookup time is not going to grow very quickly. Processing all that data is going to take up much more time than the overhead of the lookup.

    As always: profile first. If you somehow have a problem that absolutely needs the fastest possible hash map in the world, you might need an imperative hash map, but for every case I’ve ever had, the functional ones work just fine.