Search code examples
c++hashunordered-mapboost-unordered

C++ some questions on boost::unordered_map & boost::hash


I've only recently started dwelling into boost and it's containers, and I read a few articles on the web and on stackoverflow that a boost::unordered_map is the fastest performing container for big collections. So, I have this class State, which must be unique in the container (no duplicates) and there will be millions if not billions of states in the container. Therefore I have been trying to optimize it for small size and as few computations as possible. I was using a boost::ptr_vector before, but as I read on stackoverflow a vector is only good as long as there are not that many objects in it. In my case, the State descibes sensorimotor information from a robot, so there can be an enormous amount of states, and therefore fast lookup is of topemost priority. Following the boost documentation for unordered_map I realize that there are two things I could do to speed things up: use a hash_function, and use an equality operator to compare States based on their hash_function. So, I implemented a private hash() function which takes in State information and using boost::hash_combine, creates an std::size_t hash value. The operator== compares basically the state's hash values. So:

  • is std::size_t enough to cover billions of possible hash_function combinations ? In order to avoid duplicate states I intend to use their hash_values.

  • When creating a state_map, should I use as key the State* or the hash value ? i.e: boost::unordered_map<State*,std::size_t> state_map; Or boost::unordered_map<std::size_t,State*> state_map;

  • Are the lookup times with a boost::unordered_map::iterator = state_map.find() faster than going through a boost::ptr_vector and comparing each iterator's key value ?

  • Finally, any tips or tricks on how to optimize such an unordered map for speed and fast lookups would be greatly appreciated.

EDIT: I have seen quite a few answers, one being not to use boost but C++0X, another not to use an unordered_set, but to be honest, I still want to see how boost::unordered_set is used with a hash function. I have followed boost's documentation and implemented, but I still cannot figure out how to use the hash function of boost with the ordered set.


Solution

  • This is a bit muddled.

    • What you say are not "things that you can do to speed things up"; rather, they are mandatory requirements of your type to be eligible as the element type of an unordered map, and also for an unordered set (which you might rather want).

    • You need to provide an equality operator that compares objects, not hash values. The whole point of the equality is to distinguish elements with the same hash.

    • size_t is an unsigned integral type, 32 bits on x86 and 64 bits on x64. Since you want "billions of elements", which means many gigabytes of data, I assume you have a solid x64 machine anyway.

    • What's crucial is that your hash function is good, i.e. has few collisions.

    • You want a set, not a map. Put the objects directly in the set: std::unordered_set<State>. Use a map if you are mapping to something, i.e. states to something else. Oh, use C++0x, not boost, if you can.

    • Using hash_combine is good.


    Baby example:

    struct State
    {
      inline bool operator==(const State &) const;
      /* Stuff */
    };
    
    namespace std
    {
      template <> struct hash<State>
      {
        inline std::size_t operator()(const State & s) const
        {
          /* your hash algorithm here */
        }
      };
    }
    
    std::size_t Foo(const State & s) { /* some code */ }
    
    int main()
    {
      std::unordered_set<State> states; // no extra data needed
      std::unordered_set<State, Foo> states; // another hash function
    }