Search code examples
randomlanguage-agnosticreal-timebigint

How to generate an unbiased random bigint in constant time


In my embedded project, I have a biginteger class that handles arbitrary length integers. I would like to be able to generate a random bigint between 0 and an arbitrary number. Assume I have a quality source of random bytes.

All the implementations I have seen essentially do the same thing:

  1. Generate a big number with the correct number of bytes,
  2. If it is greater than max, generate again.

The problem I see with this implementation is that it could take an awfully long time. Imagine that max = 2^2049-1 =(01 FF .. FF). This algorithm will generate 257 bytes, then check that the most significant byte is <=1. So there is a 254/256 chance it has to generate a whole new 257 byte number. In the (admittedly unlikely) worst case, this loop could go on for minutes or years.

My question is:
In the case where the generated number is too large, is there a way to keep most of the bytes I have already generated?
Is it valid to just regenerate the most significant byte, or does that introduce bias? What about shifting the result right one digit?

Is there any way to make the time deterministic, while still avoiding bias?

--

Another edge case: max = 2^2048 + 1 = (01 00 .. 01) In this case the most significant byte can be non-zero if the remaining bytes are 0s followed by a 00 or 01. So most of the time, if the MSB is non-zero, than it will be invalid, and just regenerating that byte will never make it valid. But just force setting it to zero seems wrong also.


Solution

  • The answer is that it is impossible in general to generate a random unbiased integer in [0, n) in constant time. One notable exception is when the source of random numbers produces unbiased random bits and n is a power of 2.

    For instance, assume we have a "true" random generator and can produce unbiased random bits. Then, unless n is a power of 2, there are only two possible ways to proceed:

    • It can use modulo reduction (or Lemire's multiply-then-shift reduction). This will run in constant time, but introduce a bias (some numbers are slightly more likely to be generated than others).
    • It can use rejection sampling. This will introduce no bias, but can run forever in the worst case (even though it has an expected constant time complexity). Many kinds of algorithms fit in this category, including modulo reduction followed by a rejection step (which is necessary if n is not a power of 2), as well as the Fast Dice Roller (which uses random bits).

    (See my note on integer generating algorithms for a survey of both kinds of algorithms. For a Fast Dice Roller implementation, see another answer of mine.)

    In this sense, Knuth and Yao showed in 1976 that any algorithm that produces random integers with a given probability, using only random bits, can be represented as a binary tree, where random bits indicate which way to traverse the tree and each leaf (endpoint) corresponds to an outcome. (Knuth and Yao, "The complexity of nonuniform random number generation", in Algorithms and Complexity, 1976.) In this case, each integer in [0, n) can occur with probability 1/n. And if 1/n has a non-terminating binary expansion (which will be the case if n is not a power of 2), this binary tree will necessarily either—

    • have an "infinite" depth, or
    • include "rejection" leaves at the end of the tree,

    And in either case, the algorithm won't run in constant time.

    Modulo or similar reductions are equivalent to a binary tree in which rejection leaves are replaced with labeled outcomes — but since there are more possible outcomes than rejection leaves, only some of the outcomes can take the place of the rejection leaves, introducing bias. The same kind of binary tree — and the same kind of bias — results if you stop rejecting after a set number of iterations. (See also chapter 15 of Non-Uniform Random Variate Generation by L. Devroye, 1986.)

    Therefore: In general, an integer generator can be either unbiased or constant-time, but not both.

    If you can't tolerate the worst case of running forever, then the only thing you can do is set a fixed maximum number of rejections or use a reduction, both of which can introduce bias. However, this bias might be negligible depending on your application (e.g., if the chance the algorithm "fails" is negligible compared to the chance it "succeeds", for the application's purposes). There are also security aspects to random integer generation, which are too complicated to discuss in this answer.