Search code examples
pythonarraysrandomlinked-listgenerator

Generate random numbers in a range while keeping a minimum distance between values


I am after a function that produces an array with random numbers(between certain range), where no numerical value can be within a certain constant of another numerical value. This works best if it fills random 'slots' within an array. I would like it to keep on adding values until no more values can be added or the number of iterations is reached.

For instance: if there is a minimum spacing of 2, and within the array already are the values [0,3,8,10] a new value can be added to the array with a value between 5.01 and 5.99 but no values can be added between the other numbers. If the range was greater than 12, then another value could be added after 10 as well.

I have created something similar which works by creating a minimum spacing, but it doesn't fulfill the full criteria because if I increased the number of values in the randomyvalues array the spacing would just end up being the specified value, so not random, and if i decreased the number of values in the array then the spacing might be too great between values. An example of what I have done already is below:

import numpy as np

random_y_values = []
for i in range(0,20):
    n = np.random.uniform(0,15)
    random_y_values.append(n)
    random_y_values.sort()
print(random_y_values)
    

def drop_close_values(values, tolerance):
    ret = list()
    last_added = None
    for i in sorted(values):
        if (last_added is None) or (last_added + tolerance < i):
            ret.append(i)
            last_added = i
    return ret

drop_close_values(random_y_values, 2.5)
elements=len(random_y_values)

So, with the example I created, the random list generated could have one value between 0-5 and the remaining 19 could be between 19-20. I am hoping to make it so that there is a pseudorandom spread between the numbers resulting in a max gap of 2*minimum spacing.


Solution

  • Solution 1

    I am after a function that produces an array with random numbers(between certain range), where no numerical value can be within a certain constant of another numerical value.

    My first thought while idealizing this was to generate increasing random numbers, keeping the invariant that every new number is at least gap higher than the previous one.

    The challenge with this approach is that you need to control not only the minimum, but also the average gap size, so successors are not too much far from the predecessors, otherwise the number of points is too small.

    I think a pretty neat solution for this is to use an exponential distribution generation. This distribution produces small numbers more frequently than big number, where the scale factor is the mean value and controls how dense/sparse the numbers would be on average.

    import random
    
    def gap_random_expovar(low, high, gap, scale):
        while low < high:
            low += random.expovariate(1/scale)
            if low <= high:
                yield low
                low += gap
    

    A good heuristics for choosing the scale factor for this function is avg_gap - min_gap, as it will generate numbers whose average distance between each other is avg_gap.

    One thing about this solution (that might be good or bad, depending on your use case) is that there is no upper bound on the distance between numbers, as you only control the average value. That means that your result set is very very random, and so is the number of points you get. In a very extreme case, you could roll a huge number with expovar and get a set with just 1 number or even empty set.

    What is good about this solution? You get numbers already sorted, the performance is great, and the math is beautiful.

    Results:

    for _ in range(1000000):
        print(list(gap_random_expovar(low=0, high=100, gap=5, scale=5))) 
        # expects: avg 10 points, avg gap of 100/10 = 10
    
    # [  8.49 17.28 25.42 30.77 38.51 47.59 62.65 71.12 80.02 91.21 97.32 ]
    # [  8.25 13.73 33.13 41.79 48.26 55.65 67.04 72.10 77.25 83.53 94.58 ]
    # [  8.25 19.64 27.14 33.46 38.70 46.99 56.21 63.60 71.61 85.88 91.50 ]
    # [ 10.65 19.79 25.08 35.29 56.18 75.95 87.94 98.27 ]
    # [  7.92 14.52 21.46 26.56 36.45 48.59 55.63 60.90 66.89 72.02 82.58 90.10 96.01 ]
    # ... 1 million runs
    # average # of pts: 10.12  (min 2,         max 16,    stdev: 1.618)
    # average gap size:  9.73  (min 5.0000001, max 83.47, stdev: 4.715)
    
    for _ in range(1000000):
        print(list(gap_random_expovar(low=0, high=100, gap=5, scale=5/3))) 
        # expects: avg 15 points, avg gap of 100/15 = 6.67
    
    # [  3.23  9.17 14.66 19.66 26.79 34.35 41.39 46.84 52.01 59.27 65.61 70.63 76.15 81.51 90.23 96.98 ]
    # [  1.27  8.11 17.17 23.84 30.49 40.51 45.83 54.66 64.92 70.44 79.31 85.30 94.76 ]
    # [  1.05  6.05 16.81 21.85 27.02 33.17 42.88 51.70 57.58 64.50 71.39 83.60 92.13 98.08 ]
    # [  1.93 11.24 16.51 22.75 29.28 34.66 39.86 45.02 50.81 57.18 63.79 69.42 75.54 81.44 87.19 93.57 ]
    # [  0.59  5.62 11.19 18.90 24.04 30.08 36.85 43.67 48.82 54.21 61.82 69.11 74.56 82.68 88.03 93.13 99.21 ]
    # ... 1 million runs
    # average # of pts: 15.28  (min 10,          max 19,    stdev: 1.026)
    # average gap size:  6.64  (min  5.00000005, max 31.22, stdev: 1.637)
    

    If you don't like the unbounded nature of the exponential distribution, you could change it for a uniform or triangular distribution, and you'd have to define a sensible value for the maximum gap (for example 2 * min_gap, as you mentioned in the question).


    Solution 2

    This works best if it fills random 'slots' within an array. I would like it to keep on adding values until no more values can be added or the number of iterations is reached.

    At first I ignored this hint, but that is actually an excellent idea! If you continue filling in the empty spaces while keeping the minimum gap invariant, you can make your set denser interactively. Also, we can use a uniform distribution, which is bounded and easier to manage.

    A clever implementation of this turned out to be tricky, but the general idea is:

    • On each iteration, keep track of the remaining free slots with a "gap" distance from the existing points.
    • Use the sum of the free space slots as the scale of the random number.
    • When a number is created, iterate over the free slots and see where the point "falls in" the segmented free space (ignore the blocked spaces).
    • The free space reduces on every step, so the algorithm converges and always terminate.

    In my implementation, I used a double-linked list (deque) to store the free slot ranges as tuples like (low, high). Spaces within each low and high can receive a new point without breaking the rule of minimum gap.

    When a point is placed inside one of those slots, it either splits into smaller slots at a gap distance from the point, or vanishes the slot altogether if it is swollen by the gaps. The function is a generator function, and stops the iteration when the deque is empty.

    from collections import deque
    import random
    
    def gap_random_uniform(low, high, gap):
        slots, freespace = deque([(low, high)]), high - low
        while slots:
            x = random.uniform(0, freespace)
            while True:
                slotlow, slothigh = slots[0]
                slotspace = slothigh - slotlow
                if x < slotspace:
                    slots.popleft()
                    freespace -= slotspace
                    xlow, xhigh = x - gap, x + gap
                    if xlow > 0:
                        slots.append((slotlow, slotlow + xlow))
                        freespace += xlow
                    if xhigh < slotspace:
                        slots.appendleft((slotlow + xhigh, slothigh))
                        freespace += slotspace - xhigh
                    yield x + slotlow
                    break
                x -= slotspace
                slots.rotate(-1)
    

    There is a lot going on there, but the general idea is that when a new random point is generated if it is higher than the current slot, I keep rotating the linked list until the point falls in place.

    You decide if you want to take only a specific number of points, or you can exhaust the iterator and get as many points as possible.

    Results:

    from itertools import islice
    
    for _ in range(1000000):
        print(list(sorted(islice(gap_random_uniform(low=0, high=100, gap=5), 10))))
        # using itertools.islice to pick exactly 10 points
        # expects: avg gap of 100/10 = 10
    
    # [  0.69  6.96 13.71 24.34 31.77 55.19 61.49 78.53 84.81 89.98 ]
    # [ 10.94 18.61 27.70 41.60 53.00 58.16 64.04 83.90 92.22 99.87 ]
    # [  2.61 13.17 33.20 39.05 52.56 58.41 66.53 75.51 85.26 92.44 ]
    # [  6.42 14.50 22.90 31.16 41.32 48.09 62.80 71.90 87.19 99.70 ]
    # [  3.61 20.09 29.13 37.67 44.54 52.00 70.08 78.28 88.33 99.85 ]
    # ... 1 million runs
    # average # of pts: 10     (min 10,          max 10,    stdev: 0)
    # average gap size:  9.96  (min  5.00000055, max 48.17, stdev: 4.237)
    
    for _ in range(1000000):
        print(list(sorted(gap_random_uniform(low=0, high=100, gap=5))))
        # getting all the points
        # expects: avg 15 points, avg gap of 100/15 = 6.67
    
    # [  2.50  8.78 14.90 23.87 30.49 40.18 47.08 52.39 59.26 65.69 73.60 80.27 86.02 91.74 98.11 ]
    # [  1.86  7.16 13.90 20.45 25.48 30.49 35.63 40.66 45.94 51.13 60.17 66.87 73.78 78.95 86.42 92.72 99.06 ]
    # [  3.42 11.66 21.32 27.21 34.58 40.39 48.15 55.33 62.16 67.52 76.24 84.89 93.43 98.72 ]
    # [  2.64  8.46 17.50 22.55 29.00 38.82 48.18 55.19 60.19 65.22 72.15 78.81 84.95 89.98 99.96 ]
    # [  4.72  9.94 17.27 22.40 27.84 34.40 40.89 48.74 55.66 64.71 69.80 75.31 81.04 89.10 96.33 ]
    # ... 1 million runs
    # average # of pts: 15.45  (min 12,           max 19,        stdev: 0.915)
    # average gap size:  6.66  (min  5.000000002, max 9.9999993, stdev: 1.400)
    

    What I like in this solution over the first is the control, specially if you need to guarantee a minimum required number of points in the set.