Consider this function:
unsigned hash(char *s)
{
char *p;
unsigned hashval;
for(p = s; *p; p++)
hashval = *p + 31 * hashval;
return hashval;
}
How can I measure how many bytes in s
will returns a wrong result,such as overflow?
I'm on 32-bit platform.
If you change it to read
unsigned hash(const char *s)
{
const unsigned char *p;
unsigned hashval = 0;
for (p = (const unsigned char *) s; *p; p++)
hashval = *p + 31u * hashval;
return hashval;
}
then there is no longer any possibility of undefined behavior due to integer overflow, because all types involved in arithmetic are unsigned, so everything wraps mod 2n (where n is the width of unsigned
in bits). I have also fixed the use of an uninitialized variable, and made s
and p
const
, which may improve optimization and/or catch mistakes in the body of the function.
(I'm not remembering the exact arithmetic conversion rules right now; it might not have been possible in the first place. However, writing it this way makes it obviously impossible.)
BTW, there are much better hash functions known nowadays: if you don't have a strong reason to do otherwise I recommend using SipHash.