I am new to C and am confused by this hashing code:
#include <stdint.h>
// defines uint32/64_t as unsigned 32/64-bit integer.
uint32_t hash(uint32_t x, uint32_t m, uint64_t a, uint64_t b) {
// hashes x strongly universally into the range [m]
// using the random seeds a and b.
return (((a*x+b)>>32)*m)>>32;
}
The result is supposed to be in the range 0...m-1 but I don't see how that is ensured.
x and m are 32 bit integers while a and b are uniformly random 64 bit integers. (a*x+b)>>32
only uses the 32 least significant bits, so multiplied with the 32-bit integer m
, we get the exact product in 64 bits with no overflow. I still don't see why the result is in the range 0..m-1 though.
The type of a*x
is uint64_t
, so it produces at most a 64-bit result (barring an exotic C implementation with int
wider than 64 bits). Then >>32
removes 32 bits, so the result is less than 232. Then multiplying by m
produces something less than 232•m
. Then applying >>32
to that produces something less than 232•m
/232 = m
.
So the result is less than m
.
The type of a
is uint64_t
and the type of x
is uint32_t
. In a*x
, x
is converted to uint64_t
, and the multiplication is performed with uint64_t
.
The type uint64_t
can represent values in [0, 264), so the result is in that interval. (“[a, b)” means the interval includes the endpoint a and does not include the endpoint b.) If the mathematical result of the multiplication is outside that interval, the computed result equals the mathematical result reduced by subtracting some integer multiple of 264—higher bits are discarded.
Then b
is added to this. The type of b
is uint64_
, so the result is again in [0, 264).
Then >>32
is applied. This is defined to produce the result of dividing by 232 and discarding the fractional part. So the result must be in [0, 232). Equivalently, you can think of it as discarding 32 bits from a 64-bit number (possibly with leading 0 bits), yielding a 32-bit number.
Then we multiply by m
. Multiplying a number in [0, 232) by m
produces a number in [0•m
, 232•m
). (This result fits in a uint64_t
because the type of m
is uint32_t
, so we are multiplying two 32-bit numbers.)
Finally, >>32
is applied again. The result is some number in [0•m
, 232•m
) divided by 232, with the fractional part discarded. So the result is in [0•m
/232, 232•m
/232) = [0, m
).
Therefore, the result is in [0, m
). Since the fractional part is discarded, it must be in [0, m
−1].