Search code examples
pythonrandom

Python random unique values from two non overlapping ranges


100 random unique values from two ranges or to be more accurate there is one range and a subrange that is excluded from valid values.

Example have a range from 0 to 10000, random 100 numbers that are not in range 10 to 20

Requirements:

  • The Subrange can be at the very beginning or at the very end.
  • Memory overhead to absolute minimum.
  • Randomization as close to random.shuffle() as possible.

I know random.sample(xrange(0,10000),100) gives 100 unique values.

Setup I would store three values [start,end,total]

  1. start = start of subrange
  2. end = end of subrange
  3. total = the length of the range

Best I can come up with:

randlist=[]
while len(randlist)<100:
    temp=random.randint(0,total)
    if temp < start or temp > end:
       if temp not in randlist:
           randlist.append(temp)

Is this true random (pseudorandom) or am I affecting it in any way?


Solution

  • randlist = [r + (end - start + 1) * (r >= start) for r in
                random.sample(range(total - end + start), 100)]
    

    Example / "proof":

    • total=10, start=2, end=5
    • There are 7 allowed numbers: 0, 1, 6, 7, 8, 9, 10
    • range(total-end+start) = range(7) picks from 7 numbers 0..6 (so far so good)
    • Numbers larger than or equal to start=2 are shifted upwards by end-start+1=4
    • Resulting numbers are in 0, 1, 6, 7, 8, 9, 10.

    Demo:

    >>> sorted(randlist2(2000000000, 10000000, 1900000000))
    [176827, 3235435, 3278133, 3673989, 5148447, 8314140, 8885997, 1900189345, 1902880599,
    ...
    1997494057, 1997538971, 1997854443, 1997907285]
    

    This works until over 2 billion, easily beating the required upper limit of "the number of wikipedia english wikipedia pages, so whatever many million that is" :-). After that it gets OverflowError: Python int too large to convert to C ssize_t. I see no spike in memory usage of my PC and the result is instant. This is using Python 3, obviously.