Search code examples
crandomuniform-distribution

Uniform distribution in arc4random_uniform and PCG


Both arc4random_uniform from OpenBSD and the PCG library by Melissa O'Neill have a similar looking algorithm to generate a non biased unsigned integer value up to an excluding upper bound.

inline uint64_t
pcg_setseq_64_rxs_m_xs_64_boundedrand_r(struct pcg_state_setseq_64 * rng,
                                        uint64_t bound) {
    uint64_t threshold = -bound % bound;
    for (;;) {
        uint64_t r = pcg_setseq_64_rxs_m_xs_64_random_r(rng);
        if (r >= threshold)
            return r % bound;
    }
}

Isn't -bound % bound always zero? If it's always zero then why have the loop and the if statement at all?

The OpenBSD has the same thing too.

uint32_t
arc4random_uniform(uint32_t upper_bound)
{
    uint32_t r, min;

    if (upper_bound < 2)
        return 0;

    /* 2**32 % x == (2**32 - x) % x */
    min = -upper_bound % upper_bound;

    /*
     * This could theoretically loop forever but each retry has
     * p > 0.5 (worst case, usually far better) of selecting a
     * number inside the range we need, so it should rarely need
     * to re-roll.
     */
    for (;;) {
        r = arc4random();
        if (r >= min)
            break;
    }

    return r % upper_bound;
}

Apple's version of arc4random_uniform has a different version of it.

u_int32_t
arc4random_uniform(u_int32_t upper_bound)
{
    u_int32_t r, min;

    if (upper_bound < 2)
        return (0);

#if (ULONG_MAX > 0xffffffffUL)
    min = 0x100000000UL % upper_bound;
#else
    /* Calculate (2**32 % upper_bound) avoiding 64-bit math */
    if (upper_bound > 0x80000000)
        min = 1 + ~upper_bound;     /* 2**32 - upper_bound */
    else {
        /* (2**32 - (x * 2)) % x == 2**32 % x when x <= 2**31 */
        min = ((0xffffffff - (upper_bound * 2)) + 1) % upper_bound;
    }
#endif

    /*
     * This could theoretically loop forever but each retry has
     * p > 0.5 (worst case, usually far better) of selecting a
     * number inside the range we need, so it should rarely need
     * to re-roll.
     */
    for (;;) {
        r = arc4random();
        if (r >= min)
            break;
    }

    return (r % upper_bound);
}

Solution

  • Because bound is a uint64_t, -bound is evaluated modulo 264. The result is 264bound, not −bound.

    Then -bound % bound calculates the residue of 264bound modulo bound. This equals the residue of 264 modulo bound.

    By setting threshold to this and rejecting numbers that are less than threshold, the routine reduces the accepted interval to 264threshold numbers. The result is an interval that has a number of numbers that is a multiple of bound.

    From a number r selected in that interval, the routine returns r % bound. Due to the trimming of the interval, there are an equal number of occurrences of each residue, so the result has no bias for any residue over any other.