Let's suppose I've 100 items and I need to split them into two groups.
Options can be:
Divide by 2, so I get exactly 50 and 50
Select a random number between 1 and 100 and then separate that amount from the rest.
In option 2, 1 item has the same probability than 50 items (1/100) But in the reality, I imagine a Gauss bell where for e.g., 50 has the most probability, 49 and 48 the less, 47 and 46 more less, and so far.
Question:
How can simulate that "random with probability" selection?
Is there any function to do this in .NET 6?
By the way I'm working in C# but I think I can handle the lines, so that's why I don't write code here, but not the logic.
Thanks in advance
You can achieve your option 2 by iterating through the set of items and allocating each one to set 1 or set 2 with probability p = 0.5. The resulting sets have a binomial distribution, B(n=100, p=0.5), which will give a discrete approximation to the bell-shaped normal distribution. The actual results will vary, but there's a low likelihood of the set counts varying from 50 by more than 10, which corresponds to 2 standard deviations with that parameterization.
I'm not a C# user so I won't attempt to fake it in your preferred language, but it's pretty straightforward. Since Python is widely used and is pseudocode-like, here's the algorithm in that language:
import random
# create an array with the numbers 1 to 100
values = [i for i in range(1, 101)]
# repeat the following set of operations 10 times...
for replication in range(10):
# create two empty arrays
set1 = []
set2 = []
# Note: random.random() produces float values in the range [0.0, 1.0),
# the probability of getting a value < 0.5 is 1/2
# iterate through each of the values from the array created above
for value in values:
if random.random() < 0.5: # with probability 1/2
set1.append(value) # the value goes in the first set
else:
set2.append(value) # otherwise it goes in the second set
# once all values have been allocated, count how
# many are in each set and print the results
print(len(set1), " : ", len(set2))
which produces 10 splits such as:
49 : 51
48 : 52
47 : 53
59 : 41
39 : 61
50 : 50
43 : 57
54 : 46
50 : 50
60 : 40
If you want to favor one set or the other, adjust the p-value for the allocations. By simply changing the conditional to
if random.random() < 0.7:
you'll get results such as:
71 : 29
76 : 24
80 : 20
67 : 33
67 : 33
72 : 28
66 : 34
67 : 33
72 : 28
68 : 32