Both arc4random_uniform
from OpenBSD
and the PCG
library by Melissa O'Neill
have a similar looking algorithm to generate a non biased unsigned integer value up to an excluding upper bound.
inline uint64_t
pcg_setseq_64_rxs_m_xs_64_boundedrand_r(struct pcg_state_setseq_64 * rng,
uint64_t bound) {
uint64_t threshold = -bound % bound;
for (;;) {
uint64_t r = pcg_setseq_64_rxs_m_xs_64_random_r(rng);
if (r >= threshold)
return r % bound;
}
}
Isn't -bound % bound
always zero? If it's always zero then why have the loop and the if statement at all?
The OpenBSD
has the same thing too.
uint32_t
arc4random_uniform(uint32_t upper_bound)
{
uint32_t r, min;
if (upper_bound < 2)
return 0;
/* 2**32 % x == (2**32 - x) % x */
min = -upper_bound % upper_bound;
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return r % upper_bound;
}
Apple's version of arc4random_uniform
has a different version of it.
u_int32_t
arc4random_uniform(u_int32_t upper_bound)
{
u_int32_t r, min;
if (upper_bound < 2)
return (0);
#if (ULONG_MAX > 0xffffffffUL)
min = 0x100000000UL % upper_bound;
#else
/* Calculate (2**32 % upper_bound) avoiding 64-bit math */
if (upper_bound > 0x80000000)
min = 1 + ~upper_bound; /* 2**32 - upper_bound */
else {
/* (2**32 - (x * 2)) % x == 2**32 % x when x <= 2**31 */
min = ((0xffffffff - (upper_bound * 2)) + 1) % upper_bound;
}
#endif
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return (r % upper_bound);
}
Because bound
is a uint64_t
, -bound
is evaluated modulo 264. The result is 264−bound
, not −bound
.
Then -bound % bound
calculates the residue of 264−bound
modulo bound
. This equals the residue of 264 modulo bound
.
By setting threshold
to this and rejecting numbers that are less than threshold
, the routine reduces the accepted interval to 264−threshold
numbers. The result is an interval that has a number of numbers that is a multiple of bound
.
From a number r
selected in that interval, the routine returns r % bound
. Due to the trimming of the interval, there are an equal number of occurrences of each residue, so the result has no bias for any residue over any other.