Search code examples
excelexcel-formulasampling

Weighted Random Sampling by Group Variable in Excel?


I am trying to figure out how to randomly sample IDs matching a group variable, but specifically randomly sampling by weighting each grouping variable. For instance, take this data:

enter image description here

I want to weight Group A at 60%, Group B at 20%, Group C at 15%, and Group D at 5%, and then randomly sample 10 IDs, without duplicates, according to those weights, returning those IDs in another column.

Hopefully this makes sense and thank you for any help in advance!

I have figured out random sampling generally but am struggling with the weighting and not sampling duplicates.


Solution

  • You could use the following:

    =LET(a,{"A";"B";"C";"D"},
         b,{60;20;15;5},
         c,TOCOL(IFS(SEQUENCE(,SUM(b))<=LOOKUP(B2:B25,HSTACK(a,b)),A2:A25),2),
    TAKE(UNIQUE(SORTBY(c,RANDARRAY(ROWS(c)))),10))
    

    It first declares a to be your groups and it's weight b. c looks up the values in column B in array a and returns it's weight b. If the returned value is smaller than or equal to the sequence of the sum of b it returns the value of column A, else it returns an error. This results in the values of column A being repeated the number of times mentioned in b where the value in column B matches the value of a. TOCOL removes the errors and makes all values become a one column array. This is sorted randomly and the first 10 values of the uniques are taken.