Let's suppose I've 100 items and I need to split them into two groups.
Options can be:
Divide by 2, so I get exactly 50 and 50
Select a random number between 1 and 100 and then separate that amount from the rest.
In option 2, 1 item has the same probability than 50 items (1/100) But in the reality, I imagine a Gauss bell where for e.g., 50 has the most probability, 49 and 48 the less, 47 and 46 more less, and so far.
How can simulate that "random with probability" selection?
Is there any function to do this in .NET 6?
By the way I'm working in C# but I think I can handle the lines, so that's why I don't write code here, but not the logic.
Thanks in advance
You can achieve your option 2 by iterating through the set of items and allocating each one to set 1 or set 2 with probability p = 0.5. The resulting sets have a binomial distribution, B(n=100, p=0.5), which will give a discrete approximation to the bell-shaped normal distribution. The actual results will vary, but there's a low likelihood of the set counts varying from 50 by more than 10, which corresponds to 2 standard deviations with that parameterization.
I'm not a C# user so I won't attempt to fake it in your preferred language, but it's pretty straightforward. Since Python is widely used and is pseudocode-like, here's the algorithm in that language:
import random
# create an array with the numbers 1 to 100
values = [i for i in range(1, 101)]
# repeat the following set of operations 10 times...
for replication in range(10):
# create two empty arrays
set1 = []
set2 = []
# Note: random.random() produces float values in the range [0.0, 1.0),
# the probability of getting a value < 0.5 is 1/2
# iterate through each of the values from the array created above
for value in values:
if random.random() < 0.5: # with probability 1/2
set1.append(value) # the value goes in the first set
set2.append(value) # otherwise it goes in the second set
# once all values have been allocated, count how
# many are in each set and print the results
print(len(set1), " : ", len(set2))
which produces 10 splits such as:
49 : 51
48 : 52
47 : 53
59 : 41
39 : 61
50 : 50
43 : 57
54 : 46
50 : 50
60 : 40
If you want to favor one set or the other, adjust the p-value for the allocations. By simply changing the conditional to
if random.random() < 0.7:
you'll get results such as:
71 : 29
76 : 24
80 : 20
67 : 33
67 : 33
72 : 28
66 : 34
67 : 33
72 : 28
68 : 32