Search code examples
chashx8632-bitinteger-overflow

A hash function like from K&R book


Consider this function:

unsigned hash(char *s)
{
  char *p;
  unsigned hashval;
  for(p = s; *p; p++)
    hashval = *p + 31 * hashval;
  return hashval;
}

How can I measure how many bytes in s will returns a wrong result,such as overflow? I'm on 32-bit platform.


Solution

  • If you change it to read

    unsigned hash(const char *s)
    {
      const unsigned char *p;
      unsigned hashval = 0;
      for (p = (const unsigned char *) s; *p; p++)
        hashval = *p + 31u * hashval;
      return hashval;
    }
    

    then there is no longer any possibility of undefined behavior due to integer overflow, because all types involved in arithmetic are unsigned, so everything wraps mod 2n (where n is the width of unsigned in bits). I have also fixed the use of an uninitialized variable, and made s and p const, which may improve optimization and/or catch mistakes in the body of the function.

    (I'm not remembering the exact arithmetic conversion rules right now; it might not have been possible in the first place. However, writing it this way makes it obviously impossible.)

    BTW, there are much better hash functions known nowadays: if you don't have a strong reason to do otherwise I recommend using SipHash.