How does the MAD (Multiply, Add, Divide) Hashing function work?

I have been assigned as a university project the task to create data structures (such as minheap, hashtable etc.) from scratch. However the Hashtable or more specifically the Hash maps - functions have given me quite some trouble. I have come across the MAD (Multiply, Add, Divide) function which basically is: h(x) = [(a*x + b) % p] % N, where a,b : random integers, p : large prime number and N : number of elements in hashtable.

My question is how (and why) exactly does this function distribute evenly the values in the hashtable.

Solution

h(x) = [(a*x + b) % p] % N

Let's look at a*x + b in isolation first. If you imagine a broken down into a sum of powers of two, a*x is then the sum of x bit shifted left by a smattering of powers of two, such that each bit in x impacts other bit positions that are set in a, and some further bits when the summation produces carries at particular bits. Adding b mixes in another set of random bits: much like XORing would, but with some extra complexity from the carries. If say x has is a value between 0 and 255, with bits abcdefgh (each being 0 or 1), then so far we've got:

         (a&1 ? abcdefgh : 0) +
        (a&2 ? abcdefgh0 : 0) +
       (a&4 ? abcdefgh00 : 0) +
      (a&8 ? abcdefgh000 : 0) +
                     ...      +  // continues for a&16, a&32 etc.
        ABCDEFGHIJKLMNOP         // however many random bits in "b"

So, in the "1s" column we're summing h and P, which might carry into the "2s" column with g, h and O, and on it goes.

If a is say 37, which is 32+4+1, then we're adding x itself, x << 2, and x << 5: each bit in x thereby impacts more bits in the hash value (this is good, indeed with a cryptographic-strength hash function, changing any bits in the key - whether a single bit, half or all of them - should pretty much randomly flip about half the bits in the hash value).

Returning to the full formula, let's imagine we skipped the % p and had just % N, but the current table size is a power of two: % N is then equivalent to a bitwise-AND operation for some number of less-significant bits. Put another way, it's throwing away a lot of the randomness we've built up in the more significant bits of our a * x + b calculation. So, to make the hash function safe to use with any number of buckets, we can introduce % p first, which means if there are patterns in the hash value related to power-of-two positions from the summation step, they're effectively scattered across random positions in the 0..p range.

Consider say a hash between 0 and 255 - if N was 200, we'd be twice as likely to hash to a bucket in the 0..55 range. To make this effect less significant, we want the hash value to have many more bits than the MOD value, and this principle applies in a layered way to the values we should choose for p and N:

a * x + b values should tend to be significantly larger than p, and be spread across a range much larger than p, so % p separates them more across the buckets, but
p should be much larger than N, so we don't have low-indexed buckets with significantly higher collision probabilities (which is especially bad if you're using linear probing to resolve collisions).

For example, if we wanted to support values of N up to 2²⁴, and we're doing these calculations with 32 bit unsigned integers so a and b have random values in that range, we might split the difference pick a prime around about 2²⁸.