Search code examples
c++hashhashmaphashtablerabin-karp

Does "map" container of C++ apply Rabin-Karp algorithm for consecutive substrings of a string?


I'm working on a code-plagiarism detection method. I need to use fingerprint algorithm for this method. Fingerprint algorithm puts all substrings of the source code to a hash table. (All substrings have same length.) For the purpose of optimization, it's recommended that using Rabin-Karp algorithm while putting the fingerprints to the hash table.

For example; for the string = abcdef and for the length = 5, we should put abcde and bcdef substrings to the hash table. Since hashing of strings needs to apply a mathematical operation for each character of the string, it will be expensive for numerous substrings.

Rabin-Karp algorithm takes advantage of being consecutive of substrings. It calculates hash value of the first fingerprint. And for the rest of substrings, it uses the previous substring.

Does "map" container of C++ automatically apply this algorithm for consecutive substrings on the background? Or should I write my own hash library?


Solution

  • The constructor for std::unordered_map http://www.cplusplus.com/reference/unordered_map/unordered_map/ takes a hasher.

    From online docs on std::hash (https://en.cppreference.com/w/cpp/utility/hash):

    The actual hash functions are implementation-dependent and are not required to fulfill any other quality criteria except those specified above.