Iterate randomly over (possibly) huge space of bit sequences in Python

So I have a variable space of possible bit sequences, let's say for example all possible sequences of 64 bits. The space of possibilities can become arbitrary small or large, but the solution should work in all cases (e.g. it should work for all sequences of 64 bits just as well as for all of 2 bits). Now I want to iterate n times over this space, getting a random bit string each time (that I haven't encountered before of course).

Edit: As stated in the comments, this basically turns out to be a "pick n out of m items randomly without picking the same item twice"-like problem, in case this formulation makes it easier.

In theory the problem would be solved by creating a list of all possibilities, shuffling it and then picking the first n items of this list. However, since the space of possibilities can become arbitrarily large, this list could become to large for the memory to create it in the first place.

I've also thought of randomly generating a bit string each time and storing the already used ones in a set - but same problem here, since the number n of iterations isn't fixed, the set could become too memory-consuming. And it's also possible for n to be as large as the possibility space, in which case random generating won't really work at some point any way.

What would be the best way to do this?

Solution

TLDR: You can use a PRNG for a collision-free traversal of all numbers that can be converted to a bit-vector of a given length. Use this to select m numbers, then convert them to the target space as desired.

import random
from itertools import islice

def lcg_cycle(bits: int, seed=None):
    """Pseudo-randomly cycle through all unsigned integers of at most ``bits``"""
    n = 2**bits
    current = seed if seed is not None else random.randrange(0, n)
    for _ in range(n):  # run one cycle
       current = (5 * current + 1) % n
       yield current

#                                                | n| |m|
print(*(f'{num:012b}' for num in islice(lcg_cycle(12), 4)))
# => 101111000001 101011000110 010111011111 110101011100
print(*(f'{num:012b}' for num in islice(lcg_cycle(12), 4)))
# => 111010111110 100110110111 000010010100 001011100101
print(*(f'{num:03b}' for num in lcg_cycle(2)))
# => 000 001 010 011
print(*(f'{num:03b}' for num in lcg_cycle(2)))
# => 011 000 001 010

Additional pseudo-randomness can be introduced by randomly shifting this fixed sequence, and using a different conversion from numbers to bits.

Since there are many cheap 1:1 mappings between "bit sequence" and "positive number", we can look at the problem as "pick m numbers out of the range [0,n)". A robust way to do that is to construct some traversal sequence of the entire range, and pick as many values as we need.

For example, for n=8 we could choose the traversal sequence seq = [0, 5, 6, 7, 2, 3, 1, 4]. Then pick m=3 numbers as seq[:3], seq[1:4], or similar (including wrap-around). It is pretty trivial to create additional perceived randomness, e.g. by adding an offset to the numbers, or by changing our number->bits mapping.

def twisted_bits(num, length: int) -> str:
    """Less obvious int -> bin conversion"""
    bits = f"{num:0{length}b}"      # original "unsigned int" bit pattern
    return bits[1::2] + bits[0::2]  # shuffle odd bits to front

def rand8b():
    """Produce a random sequence of the 8-bit pattern"""
    seq = [0, 5, 6, 7, 2, 3, 1, 4]          # chosen by fair roll of dice!
    offset = random.randrange(0, len(seq))
    for item in seq:
        yield twisted_bits((item + offset) % len(seq), 4)

print(*rand8b())  # 0101 0010 0011 0100 0111 0000 0110 0001
print(*rand8b())  # 0001 0110 0111 0000 0011 0100 0010 0101

Now, that reduces our problem to "generate any traversal sequence". The bit-mangling and offset might be enough to use a range for small bit counts, but not for the larger ones that are problematic in the first place. So we want a similarly lazy sequence to traverse the range but in a seemingly random fashion.

Fittingly, that is what a Pseudo Random Number Generator with a full period does. One such generator class that is simple to implement for arbitrary n=2**N (i.e. length of bit vectors) are the linear congruential generators. This is for example part of the CPython dict collision resolution strategy.

def lcg_cycle(bits: int, seed=None):
    n = 2**bits
    current = seed if seed is not None else random.randrange(0, n)
    for _ in range(n):
       current = (5 * current + 1) % n
       yield current

# always generates all values from a random starting point
print(set(lcg_cycle(3)))  # {0, 1, 2, 3, 4, 5, 6, 7}
print(set(lcg_cycle(3)))  # {0, 1, 2, 3, 4, 5, 6, 7}

One can play with the constants, but 5, 1 should appear random enough.

Depending on how much apparent randomness is needed, the LCG might be enough on its own; for few bits, its cycle may be visible. As before, it can be combined with a random offset and non-standard int->bin mapping to obscure its traversal further.

from itertools import islice

def rand_bits(bits: int, count=None):
    """Produce a pseudo-random sequence of fixed-size bit patterns"""
    n = 2 ** bits
    offset = random.randrange(0, n)
    for item in islice(lcg_cycle(bits), count or n):
        yield twisted_bits((item + offset) % n, bits)

print(*rand_bits(32, 4))
# => 11101010011110010000111100100100 11011100110100110001011011101101 10011000100101010011110111011010 01000011011000000000000001111011
print(*rand_bits(32, 4))
# => 11110000110011011111000001101111 00101100111111000001110011111000 01011001111000101111101110100101 00111010011001010101010100000110
print(*rand_bits(32, 4))
# => 00001101101011111010010110001101 11010111010100110110110111100010 11000111100001100101011110001011 01111000100001001110011111011000