Search code examples
chashxorfnv

How would you interpret the behaviour of my C Hash function (type of Fowler–Noll–Vo_hash_function)?


I dont understand why the interger Value "hash" is getting lower in/after the 3 loop.

I would guess this happen because the uint limitation is 2,147,483,647. BUT... when i try to go step by step the value is equal to 2146134658?. I´m not that good in math but it should be lower than the limitation.

#define FNV_PRIME_32 16777619
#define FNV_OFFSET_32 2166136261U

unsigned int hash_function(const char *string, unsigned int size)
{
    unsigned int str_len = strlen(string);

    if (str_len == 0) exit(0);

    unsigned int hash = FNV_OFFSET_32;

    for (unsigned int i = 0; i < str_len; i++)
    {
        hash = hash ^ string[i]; 
        // Multiply by prime number found to work well
        hash = hash * FNV_PRIME_32;
        if (hash > 765010506)
            printf("YO!\n");
        else
            printf("NOO!\n");
    }
    return hash % size;
}  

If you are wondering this if statement is only for me.

if (hash > 765010506)
    printf("YO!\n");
else
    printf("NOO!\n");

765010506 is the value for hash after the next run through the loop.


Solution

  • I dont understand why the interger Value "hash" is getting lower in/after the 3 loop.

    All unsigned integer arithmetic in C is modular arithmetic. For unsigned int, it is modulo UINT_MAX + 1; for unsigned long, modulo ULONG_MAX + 1, and so on.

    (a modulo m means the remainder of a divided by m; in C, a % m if both a and m are unsigned integer types.)

    On many current architectures, unsigned int is a 32-bit unsigned integer type, with UINT_MAX == 4294967295.

    Let's look at what this means in practice, for multiplication (by 65520, which happens to be an interesting value; 216 - 16):

    unsigned int  x = 1;
    int           i;
    for (i = 0; i < 10; i++) {
        printf("%u\n", x);
        x = x * 65520;
    }
    

    The output is

    1
    65520
    4292870400
    50327552
    3221291008
    4293918720
    16777216
    4026531840
    0
    0
    

    What? How? How come the result ends up zero? That cannot happen!

    Sure it can. In fact, you can show mathematically that it happens eventually whenever the multiplier is even, and the modulo is with respect to a power of two (232, here).

    Your particular multiplier is odd, however; so, it does not suffer from the above. However, it still wraps around due to the modulo operation. If we retry the same with your multiplier, 16777619, and a bit longer sequence,

    unsigned int  x = 1;
    int           i;
    for (i = 0; i < 20; i++) {
        printf("%u\n", x);
        x = x * 16777619;
    }
    

    we get

    1
    16777619
    637696617
    1055306571
    1345077009
    1185368003
    4233492473
    878009595
    1566662433
    558416115
    1485291145
    3870355883
    3549196337
    924097827
    3631439385
    3600621915
    878412353
    2903379027
    3223152297
    390634507
    

    In fact, it turns out that this sequence is 1,073,741,824 iterations long (before it repeats itself), and will never yield 0, 2, 4, 5, 6, 7, 8, 10, 12, 13, 14, or 15, for example -- that is, if it starts from 1. It even takes 380 iterations to get a result smaller than 16,777,619 (16,689,137).

    For a hash function, that is okay. Each new nonzero input changes the state, so the sequence is not "locked". But, there is no reason to expect the hash value increases monotonically as the length of the hashed data increases; it is much better to assume it is "roughly random" instead: not really random, as it depends on the input only, but also not obviously regular-looking.