Search code examples
c++stringcastingsize-tc-str

Function call to c_str() vs const char* in hash function


I was looking at hash functions on stackoverflow when I found one that was pretty interesting. It involves casting a const char* to a size_t* and then de-referencing the size_t. This is then bit shifted to a certain precision. This works for const char*, producing the same value each time. However, when I use an actual string type, and call c_str() instead, the two values produced do not match. Furthermore, on each run of the code, the string produces different values each run. Anyone have an idea of why this is occurring?

const string l = "BA";
const char* k = l.c_str();
const char* p = "BA";
cout << k << " " << *((size_t*)k) << endl;
cout << p << " " << *((size_t*)p) << endl;

Run 1:

BA 140736766951746
BA 7162260525311607106

Run 2:

BA 140736985055554
BA 7162260525311607106

Original question: Have a good hash function for a C++ hash table?


Solution

  • *((size_t*)k) causes undefined behaviour by violating the strict aliasing rule. This code is only valid if k actually points to an object of type size_t.

    Being undefined behaviour, seeing weird numbers is a possible result (as would be anything else).


    I guess you intended something akin to:

    size_t x;
    memcpy(&x, k, sizeof x);
    cout << k << " " << x << '\n';
    

    It should now be clear what the problem is. Your string only contains 3 characters (2 plus the null terminator), however you attempt to read more than 3 characters which also causes undefined behaviour.