Search code examples
c++c++11hashmapunordered

Reading from unordered_multiset results in crash


While refactoring some old code a cumbersome multilevel-map developed in-house was replaced by an std::undordered_multiset.

The multilevel-map was something like [string_key1,string_val] . A complex algorithm was applied to derive the keys from string_val and resulted in duplicate string_val being stored in the map but with different keys.

Eventually at some point of the application the multilevel-map was iterated to get the string_val and its number of occurrences.

It replaced was an std::unordered_multilevelset and string_val are just inserted to it. It seems much simpler than having an std::map<std::string,int> and checking-retrieving-updating the counter for every insertion.

What I want to do retrieve the number of occurrences of its inserted element, but I do not have the keys beforehands. So I iterate over the buckets but my program crashes upon creation of the string.

// hash map declaration
std::unordered_multiset<std::string> clevel;

// get element and occurences
for (size_t cbucket = clevel->bucket_count() - 1; cbucket != 0; --cbucket)
{
        std::string cmsg(*clevel->begin(cbucket));
        cmsg += t_str("times=") + \
                std::to_string(clevel->bucket_size(cbucket));
}

I do not understand what is going on here, tried to debug it but I am somehow stack( overflown ?) :) . Program crashes in std::string cmsg(*it);


Solution

  • You should consider how multiset actually works as a hashtable. For example reading this introduction you should notice that hash maps actually preallocate their internal buckets , and the number of buckets is optimized.

    Therefore if you insert element "hello" , you will probably get a number of buckets already created, but only the one corresponding to hash("hello") will actually have an element that you may dereference. The rest will be let's say invalid.

    Dereferencing the iterator to the begin of every bucket results in SEGV which is your case here.

    To remedy this situation you should check every time that begin is not past the end.

    for (size_t cbucket = clevel->bucket_count() - 1; cbucket != 0; --cbucket)
    {
        auto it = clevel->begin(cbucket);
        if (it != clevel->end(cbucket))
        {
            std::string cmsg(*it);
            cmsg += t_str("times=") + \
                    std::to_string(clevel->bucket_size(cbucket));
        }
    }