Search code examples
c#algorithmrandomprobability

Split an amount into 2 random groups by Gauss bell


Let's suppose I've 100 items and I need to split them into two groups.

Options can be:

  1. Divide by 2, so I get exactly 50 and 50

  2. Select a random number between 1 and 100 and then separate that amount from the rest.

In option 2, 1 item has the same probability than 50 items (1/100) But in the reality, I imagine a Gauss bell where for e.g., 50 has the most probability, 49 and 48 the less, 47 and 46 more less, and so far.

Question:

How can simulate that "random with probability" selection?
Is there any function to do this in .NET 6?

By the way I'm working in C# but I think I can handle the lines, so that's why I don't write code here, but not the logic.

Thanks in advance


Solution

  • You can achieve your option 2 by iterating through the set of items and allocating each one to set 1 or set 2 with probability p = 0.5. The resulting sets have a binomial distribution, B(n=100, p=0.5), which will give a discrete approximation to the bell-shaped normal distribution. The actual results will vary, but there's a low likelihood of the set counts varying from 50 by more than 10, which corresponds to 2 standard deviations with that parameterization.

    I'm not a C# user so I won't attempt to fake it in your preferred language, but it's pretty straightforward. Since Python is widely used and is pseudocode-like, here's the algorithm in that language:

    import random
    
    # create an array with the numbers 1 to 100
    values = [i for i in range(1, 101)]
    
    # repeat the following set of operations 10 times...
    for replication in range(10):
        # create two empty arrays
        set1 = []
        set2 = []
    
        # Note: random.random() produces float values in the range [0.0, 1.0),
        # the probability of getting a value < 0.5 is 1/2
    
        # iterate through each of the values from the array created above
        for value in values:
            if random.random() < 0.5:  # with probability 1/2
                set1.append(value)     # the value goes in the first set
            else:
                set2.append(value)     # otherwise it goes in the second set
    
        # once all values have been allocated, count how
        # many are in each set and print the results
        print(len(set1), " : ", len(set2))
    

    which produces 10 splits such as:

    49  :  51
    48  :  52
    47  :  53
    59  :  41
    39  :  61
    50  :  50
    43  :  57
    54  :  46
    50  :  50
    60  :  40
    

    If you want to favor one set or the other, adjust the p-value for the allocations. By simply changing the conditional to

    if random.random() < 0.7:
    

    you'll get results such as:

    71  :  29
    76  :  24
    80  :  20
    67  :  33
    67  :  33
    72  :  28
    66  :  34
    67  :  33
    72  :  28
    68  :  32