Search code examples
c++hashmapunordered-map

Performance Considerations When Using Large Custom Objects as Keys in C++ unordered_map


I'm working on a C++ project where I've defined a custom class to use as the key in an unordered_map. To do this, I overrode the == operator and provided a custom hash function. The following is a simplified version of my code.

During execution, I noticed that inserting key-value pairs into the unordered_map triggers the copy constructor of my custom class. This leads me to worry about potential performance issues, especially when dealing with large objects that contain significant amounts of data.

Coming from a background in Python and Java, where hash tables store references to objects rather than copying them, I'm unsure about the best practices for using hash tables in C++. Is there a more efficient way to handle large objects as keys to avoid the overhead of copying? What are the recommended practices for optimizing performance in such scenarios?

PS: I don't believe this is an option-based question. When dealing with objects that serve as keys and are significantly large—such as one encapsulating a large-sized vector of integers—there are undoubtedly conventional methods to avoid copying the key with each insertion. One approach I can think of involves using pointers. However, as I am just beginning to learn C++, I am not certain if this is the best method.

#include <iostream>
#include <unordered_map>
#include <functional> 

class MyClass {
private:
    int key;
    std::string value;

public:
    MyClass(int k, const std::string& v) : key(k), value(v) {}

    MyClass(const MyClass& obj) {
        std::cout << "copy constructor called" << std::endl;
        key = obj.key;
        value = obj.value;
    }

    // override operator== 
    bool operator==(const MyClass& other) const {
        return (key == other.key && value == other.value);
    }

    // hash function
    friend std::size_t hash_value(const MyClass& obj) {
        std::size_t seed = 0;
        std::hash<int> hasher;
        std::hash<std::string> strHasher;
        seed ^= hasher(obj.key) + 0x9e3779b9 + (seed<<6) + (seed>>2);
        seed ^= strHasher(obj.value) + 0x9e3779b9 + (seed<<6) + (seed>>2);
        return seed;
    }
};

namespace std {
    template <>
    struct hash<MyClass> {
        std::size_t operator()(const MyClass& obj) const {
            return hash_value(obj);
        }
    };
}

int main() {
    std::unordered_map<MyClass, int> map;

    MyClass c1(1, "1");
    MyClass c2(2, "2");
    MyClass c3(1, "1");
    std::cout << "------step 1-------" << std::endl;
    map[c1] = 1;
    map[c2] = 2;
    std::cout << "------step 2-------" << std::endl;
    std::cout << map[c3] << std::endl;
    std::cout << "------step 3-------" << std::endl;
    map[MyClass(5, "5")] = 5;
    std::cout << "------step 4-------" << std::endl;
    std::cout << (map.find(MyClass(4, "4")) != map.end()) << std::endl;
    std::cout << "------step 5-------" << std::endl;
    auto p = std::make_pair(c1, 5);
    std::cout << "------step 6-------" << std::endl;
    map.insert(p);
    return 0;
}

print result:

------step 1-------
copy constructor called
copy constructor called
------step 2-------
1
------step 3-------
copy constructor called
------step 4-------
0
------step 5-------
copy constructor called
------step 6-------

Solution

  • You can use std::shared_ptr of MyClass as the keys in your unordered_map. This would stop the copy. The MyClass implementation remains mostly same. You can change the hashing as follows.

    namespace std {
    template <>
    struct hash<std::shared_ptr<MyClass>> {
        std::size_t operator()(const std::shared_ptr<MyClass>& obj) const {
            return hash_value(*obj);
        }
    };
    
    template <>
    struct equal_to<std::shared_ptr<MyClass>> {
        bool operator()(const std::shared_ptr<MyClass>& lhs, const 
        std::shared_ptr<MyClass>& rhs) const {
            return *lhs == *rhs;
        }
    };
    

    Here, equal_to<std::shared_ptr<MyClass>> dereferences the shared_ptrs and uses MyClass::operator== to compare the MyClass objects.

    int main() {
        std::unordered_map<std::shared_ptr<MyClass>, int> map;
    
        auto c1 = std::make_shared<MyClass>(1, "1");
        auto c2 = std::make_shared<MyClass>(2, "2");
        auto c3 = std::make_shared<MyClass>(1, "1");
        std::cout << "------step 1-------" << std::endl;
        map[c1] = 1;
        map[c2] = 2;
        std::cout << "------step 2-------" << std::endl;
        std::cout << map[c3] << std::endl;
        std::cout << "------step 3-------" << std::endl;
        map[std::make_shared<MyClass>(5, "5")] = 5;
        std::cout << "------step 4-------" << std::endl;
        std::cout << (map.find(std::make_shared<MyClass>(4, "4")) != 
        map.end()) << std::endl;
        std::cout << "------step 5-------" << std::endl;
        auto p = std::make_pair(c1, 5);
        std::cout << "------step 6-------" << std::endl;
        map.insert(p);
        return 0;
    }
    

    Output:

    ------step 1-------
    ------step 2-------
    1
    ------step 3-------
    ------step 4-------
    0
    ------step 5-------
    ------step 6-------
    

    As I see it, one of the main issues of using the std::shared_ptr as keys to an std::unordered_map is that if there are multiple pointers pointing to the same object, the map will still consider them as different keys.