Search code examples
c++hashunordered-setconst-correctness

insert into a C++ unordered_map with known hash-value


I have a "table" that is a std::unorderd_set of, basically, std::vector<int> with a hash-function that returns the XOR of the hash-values of all elements in the vector. Thus, the hash-value of a vector depends only on the elements, not on their order.

Now, I am implementing a dynamic programming on said table that is supposed to first check if a vector already is in the table and, if not, compute a table entry for it. This computation, however, might change the order of the elements of the vector. So here's what I want to do, roughly:

using Entry = vector<int>;
using Table = unordered_set<Entry, set_hash>; // set_hash is the XOR-based hasher
Table dp_table;

const Entry& query(const vector<int>& q) {
  auto [iter, success] = dp_table.emplace(q);
  // alternatively, I can dp_table.find() here, computing the hash of q
  if(success) {
    compute_entry(*iter); // I want to permute *iter here (doesn't change its hash)
    // alternatively, I can dp_table.emplace() here, computing the hash of q a second time
  }
  return *iter;
}

Now, the way I wrote it here doesn't work because *iter is const. Understandable, as the unordered_set cannot be sure my modification doesn't change the hash (but I know it doesn't). Here's possible solutions and why I don't like them:

  1. I could use the "alternatively"-way, but that means I have to repeat the (costly) hash-calculation which I would really like to avoid. I'd like to be able to tell the unordered_set "hey, this is the hash-value of that thing, trust me", but that's not possible.
  2. I could wrap the Entry in a class in which I declare the vector mutable (as suggested here), but that means I loose const-correctness whenever I deal with elements of the table. Also, I think modifying a const-item is officially UB.
  3. I could modify the hasher such that it stores the hash with each item and just takes this cached hash instead of recomputing the hash whenever it sees an item with a precomputed hash, but that's dangerous if ever I do change the vector and forget to clear the cached hash. Also it takes up additional space in memory, which I want to avoid.

So that's my problem and why I dislike all the solutions I could come up with. Do you have a better idea how to solve this? It's much appreciated :)


Solution

  • You could store the calculated hash value together with the vector:

    class Entry {
    public:
        std::size_t hash() const { return m_hash; }
    private:
        std::vector<int> m_data;
        std::size_t m_hash{};
    };
    
    template <> struct std::hash<Entry> {
        size_t operator()(const Entry& e) const { return e.hash(); }
    };
    

    Then only recalculate the hash value when performing operations that actually affects the hash value.

    When you need to perform an operation on the vector that changes it in any way, just extract it, make the changes and reinsert it:

    std::unordered_set<Entry> s;
    Entry e;
    auto [sit, _] = s.insert(e);
        
    auto node = s.extract(sit);
    node.value().operation_that_does_not_alter_the_hash();
    s.insert(std::move(node));
    

    Since the member function operation_that_does_not_alter_the_hash() won't alter the hash value, it won't need to recalculate it. When reinserting the node into the unordered_set, it'll call the member function hash() which will return the precalculated value.

    If you on the other hand call operation_that_does_alter_the_hash(), that function should then recalculate the hash and store that. Again, reinserting the node will be done in the same way and no extra recalculation of the hash value will be needed.

    Demo