I have an infinite stream of random bits.
I want to generate a random number between 0 and 999 (inclusive).
Is the best algorithm:
Or is there a better way? Perhaps an algorithm which is guaranteed to consume only a finite number of bits?
In situations like this it is important to make a distinction between "infinite" and "unbounded".
In your case, you'll be able to generate a random number with a finite number of bits with probability 1, but you cannot guarantee an upper-bound on the needed number of bits with probability 1.
We summarise this by saying "The number of bits required is finite but unbounded."
However, you can guarantee relatively-high probability upper-bounds.
Even with the naive strategy of pulling 10 bits, discarding the generated number if it's higher than 1000, and starting over, you have only probability 24/1024 = 0.2% of needing more than 10 bits, probability (24/1024)^2 = 0.05% of needing more than 20 bits, probability (24/1024)^3 = 0.001% of needing more than 30 bits, probability (24/1024)^4 = 0.00003% of needing more than 40 bits, etc.
You can be slightly more efficient by not discarding the extra bits of information when you reject a number, ie by remembering by how much you've exceeded 999. If you exceed 999 that means the number is somewhere between [1000, 1023]. There are 24 different ways to exceed 999, and they are uniformly distributed. Subtract 1000 and you get a uniform random number in [0, 23], which is worth approximately 4.5 bits of randomness, which you can use for your next generated random number.
Here is an implementation of this method in python:
def gen_random(bitstream, n):
x = 0 # random number generated so far
m = 1 # x is always uniform in range [0, m-1]
for bit in bitstream:
x = x * 2 + bit
m *= 2
if m >= n:
if x < n:
return x
else:
x = x - n
m = m - n
from itertools import repeat
from random import getrandbits
bitstream = map(getrandbits, repeat(1))
print('Uniform random number between 0 and 999: ', gen_random(bitstream, 1000))
Note: it terms of practical speed of execution, the above code is inefficient because it calls getrandbits(1)
ten times instead of calling getrandbits(10)
one time - but in terms of number of bits pulled from the stream, it is optimal.