I dont understand why the interger Value "hash" is getting lower in/after the 3 loop.
I would guess this happen because the uint limitation is 2,147,483,647
.
BUT... when i try to go step by step the value is equal to 2146134658
?.
I´m not that good in math but it should be lower than the limitation.
#define FNV_PRIME_32 16777619
#define FNV_OFFSET_32 2166136261U
unsigned int hash_function(const char *string, unsigned int size)
{
unsigned int str_len = strlen(string);
if (str_len == 0) exit(0);
unsigned int hash = FNV_OFFSET_32;
for (unsigned int i = 0; i < str_len; i++)
{
hash = hash ^ string[i];
// Multiply by prime number found to work well
hash = hash * FNV_PRIME_32;
if (hash > 765010506)
printf("YO!\n");
else
printf("NOO!\n");
}
return hash % size;
}
If you are wondering this if statement is only for me.
if (hash > 765010506)
printf("YO!\n");
else
printf("NOO!\n");
765010506
is the value for hash after the next run through the loop.
I dont understand why the interger Value "hash" is getting lower in/after the 3 loop.
All unsigned integer arithmetic in C is modular arithmetic. For unsigned int
, it is modulo UINT_MAX + 1
; for unsigned long
, modulo ULONG_MAX + 1
, and so on.
(a
modulo m
means the remainder of a
divided by m
; in C, a % m
if both a
and m
are unsigned integer types.)
On many current architectures, unsigned int
is a 32-bit unsigned integer type, with UINT_MAX == 4294967295
.
Let's look at what this means in practice, for multiplication (by 65520, which happens to be an interesting value; 216 - 16):
unsigned int x = 1;
int i;
for (i = 0; i < 10; i++) {
printf("%u\n", x);
x = x * 65520;
}
The output is
1
65520
4292870400
50327552
3221291008
4293918720
16777216
4026531840
0
0
What? How? How come the result ends up zero? That cannot happen!
Sure it can. In fact, you can show mathematically that it happens eventually whenever the multiplier is even, and the modulo is with respect to a power of two (232, here).
Your particular multiplier is odd, however; so, it does not suffer from the above. However, it still wraps around due to the modulo operation. If we retry the same with your multiplier, 16777619
, and a bit longer sequence,
unsigned int x = 1;
int i;
for (i = 0; i < 20; i++) {
printf("%u\n", x);
x = x * 16777619;
}
we get
1
16777619
637696617
1055306571
1345077009
1185368003
4233492473
878009595
1566662433
558416115
1485291145
3870355883
3549196337
924097827
3631439385
3600621915
878412353
2903379027
3223152297
390634507
In fact, it turns out that this sequence is 1,073,741,824 iterations long (before it repeats itself), and will never yield 0, 2, 4, 5, 6, 7, 8, 10, 12, 13, 14, or 15, for example -- that is, if it starts from 1. It even takes 380 iterations to get a result smaller than 16,777,619 (16,689,137).
For a hash function, that is okay. Each new nonzero input changes the state, so the sequence is not "locked". But, there is no reason to expect the hash value increases monotonically as the length of the hashed data increases; it is much better to assume it is "roughly random" instead: not really random, as it depends on the input only, but also not obviously regular-looking.