Search code examples
c++c++11hashunordered-mapconst-iterator

How can I hash a std::unordered_map::const_iterator?


Do you remember my prior question: What is causing data race in std::async here?
Even though I successfully parallelized this program, it still ran too slowly to be practical.
So I tried to improve the data structure representing a Conway's Game of Life pattern.
Brief explanation of the new structure:

class pattern {
    // NDos::Lifecell represents a cell by x and y coordinates.
    // NDos::Lifecell is equality comparable, and has std::hash specialization.
private:
    std::unordered_map<NDos::Lifecell, std::pair<int, bool>> cells_coor;
    std::unordered_set<decltype(cells_coor)::const_iterator> cells_neigh[9];
    std::unordered_set<decltype(cells_coor)::const_iterator> cells_onoff[2];
public:
    void insert(int x, int y) {
        // if coordinate (x,y) isn't already ON,
        // turns it ON and increases the neighbor's neighbor count by 1.
    }
    void erase(int x, int y) {
        // if coordinate (x,y) isn't already OFF,
        // turns it OFF and decreases the neighbor's neighbor count by 1.
    }
    pattern generate(NDos::Liferule rule) {
        // this advances the generation by 1, according to the rule.
        // (For example here, B3/S23)
        pattern result;
        // inserts every ON cell with 3 neighbors to result.
        // inserts every OFF cell with 2 or 3 neighbors to result.
        return result;
    }
    // etc...
};

In brief, pattern contains the cells. It contains every ON cells, and every OFF cells that has 1 or more ON neighbor cells. It can also contain spare OFF cells.
cells_coor directly stores the cells, by using their coordinates as keys, and maps them to their number of ON neighbor cells (stored as int) and whether they are ON (stored as bool).

cells_neigh and cells_onoff indirectly stores the cells, by the iterators to them as keys.
The number of ON neighbor of a cell is always 0 or greater and 8 or less, so cells_neigh is a size 9 array.
cells_neigh[0] stores the cells with 0 ON neighbor cells, cells_neigh[1] stores the cells with 1 ON neighbor cell, and so on.
Likewise, a cell is always either OFF or ON, so cells_onoff is a size 2 array.
cells_onoff[false] stores the OFF cells, and cells_onoff[true] stores the ON cells.
Cells must be inserted to or erased from all of cells_coor, cells_neigh and cells_onoff. In other words, if a cell is inserted to or erased from one of them, it must be so also for the others. Because of this, the elements of cells_neigh and cells_onoff is std::unordered_set storing the iterators to the actual cells, enabling fast access to the cells by a neighbor count or OFF/ON state.

If this structure works, the insertion function will have average time complexity of O(1), the erasure also O(1), and the generation O(cells_coor.size()), which are great improval of time complexity from the prior structure.

But as you see, there is a problem: How can I hash a std::unordered_map::const_iterator?
std::hash prohibits a specialization for them, so I have to make a custom one.
Taking their address won't work, as they are usually acquired as rvalues or temporaries.
Dereferencing them also won't work, as there are multiple cells that have 0 ON neighbor cells, or multiple cells that is OFF, etc.
So what can I do? If I can't do anything, cells_neigh and cells_onoff will be std::vector or something, sharply degrading the time complexity.


Solution

  • Short story: this won't work (really well)(*1). Most of the operations that you're likely going to perform on the map cells_coor will invalidate any iterators (but not pointers, as I learned) to its elements.

    If you want to keep what I'd call different "views" on some collection, then the underlying container storing the actual data needs to be either not modified or must not invalidate its iterators (a linked list for example).

    Perhaps I'm missing something, but why not keep 9 sets of cells for the neighbor counts and 2 sets of cells for on/off? (*2) Put differently: for what do you really need that map? (*3)


    (*1): The map only invalidates pointers and iterators when rehashing occurs. You can check for that:

    // Before inserting
    (map.max_load_factor() * map.bucket_count()) > (map.size() + 1)
    

    (*2): 9 sets can be reduced to 8: if a cell (x, y) is in none of the 8 sets, then it would be in the 9th set. Thus storing that information is unnecessary. Same for on/off: it's enough to store cells that are on. All other are off.

    (*3): Accessing the number of neighbours without using the map but only with sets of cells, kind of pseudo code:

    unsigned number_of_neighbours(Cell const & cell) {
      for (unsigned neighbours = 9; neighbours > 0; --neighbours) {
        if (set_of_cells_with_neighbours(neighbours).count() == 1) {
          return neighbours;
        }
      }
      return 0;
    }
    

    The repeated lookups in the sets could of course destroy actual performance, you'd need to profile that. (Asymptotic runtime is unaffected)