Search code examples
google-sheetsgoogle-sheets-formula

Select random values from a column without replacement multiple times


I have a Google Sheet with data in column A. I have a list of dates in column B. There are more dates in column B than there are values in column A. Column C is empty:

Column A Column B Column C
Value 1 Date 1
Value 2 Date 2
Value 3 Date 3
Date 4
Date 5
Date 6
Date 7
Date 8

I want to select a random value from column A for every date in column B without immediate replacement. Replacement will have to happen after all values have been used, so I want to start the random selection fresh once this happens. So a valid solution would look like:

Column A Column B Column C
Value 1 Date 1 Value 2
Value 2 Date 2 Value 3
Value 3 Date 3 Value 1
Date 4 Value 3
Date 5 Value 1
Date 6 Value 2
Date 7 Value 2
Date 8 Value 3

Dates 1-3 get a random selection of the 3 values in column A, then 4-6 get another, etc. Note that Value 2 occurs on date 6 and date 7 because selection starts again; this is fine.

I would also like to be able to add values to column A later on. I would like the random selection to update for any date after the date I make the update on (using the wallclock or something to determine the date) while keeping the values in column C for days that have passed (including the day the update is made). So if I add Value 4 to column A on date 5, a valid solution would look like:

Column A Column B Column C
Value 1 Date 1 Value 2
Value 2 Date 2 Value 3
Value 3 Date 3 Value 1
Value 4 Date 4 Value 3
Date 5 Value 1
Date 6 Value 4
Date 7 Value 2
Date 8 Value 3
Date 9 Value 2
Date 10 Value 1
Date 11 Value 4
Date 12 Value 1
Date 13 Value 2

The values in column C for dates 1-5 are left unchanged. The values for dates 6 and 7 are recomputed, selected randomly from (value 2, value 4) because they have not been selected since the random selection "reset" after day 3. All 4 values are randomly selected in days 8-11, then "reset" again after day 11.

I tried

=array_constrain(sort(filter(A1:A,A1:A<>""),randarray(counta(A1:A),1),1),5,1)

to select 5 days of values from column A without replacement. However, this only selected 3 values because column A only had 3 values.


Solution

  • I've got a formula you can use which I think gets closer to what you want. With your table at the top left of the sheet, enable iterative calculation in Settings, set the maximum number of iterations to 1 and set the recalculation interval to 'On change and every hour'. Then place the following in C2:

    =arrayformula(let(
    values,tocol(A2:A,1),
    dates,tocol(B2:B,1),
    valcount,counta(values),
    datecount,counta(dates),
    groupcount,ceiling(datecount/valcount),
    paddates,wrapcols(dates,valcount*groupcount,),
    randomindexes,tocol(byrow(randarray(groupcount,valcount),lambda(row,rank(row,row)))),
    selection,xlookup(filter(randomindexes,len(paddates)),sequence(valcount),values),
    if(dates>today(),selection,C2:C)))
    

    I am assuming that the dates in column B are genuine dates and are formatted as such, and that you will 'initialise' the table with a series of dates in column B which start from the day after the day you set the table up. Column C will be populated with a random selection from the values in column A (without replacement), looping after all values have been selected, and the with each new day the previous day's value selected is 'memorised' by exploiting a deliberate circular reference.

    If you change the number of values in column A during the time the table is running, future days will randomly select from the new list of values without affecting the previous day's selections; the one thing I haven't been able to implement is where you've asked for the random selection immediately after a change in the number of values to take into account the previous values randomly selected and not to pick those - I'm not sure how to handle that.