Search code examples
algorithmmatlabrandomexperimental-design

How do I create a constrained, randomized experimental protocol?


I am trying to build a protocol in Matlab to randomize experimental trial orders within a framework, and cannot create an efficient algorithm to do so.

Problem

I want to administer T number of trials ([t1, t2, t3…], each with different parameters), and repeat these for R runs [r1, r2, r3…]. Each run will be made up of the same exact trials, but in a random order for each run. Furthermore, the randomization is constrained, to account for effects of being earlier or later in a given run. Over all runs, each trial must occur an equal number of times early, middle, and late in the run. To do this, each run is broken into G groups [g1, g2, g3…]. I want to make an algorithm that will generate a set of R random runs, constrained such that each trial t appears in each group of any run N=R/G times.

Example

T = 9 trials [t1, t2,…, t9] will be repeated for R = 6 runs [r1, r2, …, r6]. A number of groups G = 3, [g1, g2, g3], will be used. An example set of 6 runs is shown below. (| characters separate groups. Order within groups does not matter, as trial orders within groups will be randomized later. Groups have NO meaning outside of building the run. Once the run is constructed, it will be viewed as one cohesive unit.)

      s1 s2 s3   s4 s5 s6   s7 s8 s9
r1 = [t1 t2 t3 | t4 t5 t6 | t7 t8 t9]
r2 = [t3 t4 t7 | t2 t6 t8 | t1 t5 t9]
r3 = [t5 t8 t9 | t1 t2 t7 | t3 t4 t6]
r4 = [t2 t4 t6 | t3 t5 t9 | t1 t7 t8]
r5 = [t5 t6 t9 | t1 t7 t8 | t2 t3 t4]
r6 = [t1 t7 t8 | t3 t4 t9 | t2 t5 t6]

Note that each trial appears once per run (row), and N = R/G = 6/3 = 2 times per group (“|” delimited column). Each individual column is a slot [s1, s2, …, s9], each of which contains a different trial for each run. My implementation of the algorithm will require R = 10, and T >= 320.

My attempts

I have tried 3 different algorithms, all of which have failed, either completely (using T = 20, small) or they do not scale well (take a very long time with T = 320). I find G using G = max(gcd(d(R), T)), where gcd is the “greatest common divisor” between the two arguments, and d is the “divisors” function, which lists all divisors of the input (not including the input itself). A brief description of my algorithms:

1. Create a random trial order as a prototype for every run (same random prototype for each). For each run, go through each slot and switch the trial that is there with a trial in a randomly chosen different slot (provided both trials are allowed to occupy their respective new slots given the maximum number of times any given trial can appear in the same group, N). Because the randomization eventually leads to situations where no switches exist, any time such a case occurs, the algorithm is restarted until a final solution is found. This works for T = 20 usually under 10 seconds, but becomes quickly intractable.

2. For each trial, go through each run and randomly pick a group that this trial still needs to and is still allowed to join (the group isn’t full for that run, and fewer than N of the same trial are in that group over all runs up until this one). This algorithm also leads to situations where no choices exist, so it is also restarted when it gets stuck. This algorithm takes a very long time to converge.

3. For each run, assign each trial to a group, assigning trials with the least number of assigned groups first. This algorithm also gets stuck, and also takes a very long time.

I can provide Matlab code for these implementations, if you wish.


Ideally, I would like a deterministic algorithm that finds every unique group combination (though I don’t know how many exist), and I would select a random grouping to use each time. Any help with this problem will be greatly appreciated.


Solution

  • For parameters T=3N R=3 here is one solution:

    Generate a random permutation for the first row

    The second row is the first row shifted so that the trials that were in the first group are now in the second group, the trials that were in the second group are now in the first group, and the trials that were in the third group are now in the first group.

    The third row is the second row shifted along again.

    So this pattern is

    [t1 t2 t3 | t4 t5 t6 | t7 t8 t9]

    [t7 t8 t9 | t1 t2 t3 | t4 t5 t6]

    [t4 t5 t6 | t7 t8 t9 | t1 t2 t3]

    For parameters T=3N R=3M I suggest that you divide the rows into groups of three and generate each group of three as above. You could of course shuffle rows after generation.

    Note that you use a different permutation for each set of three rows. For example with 6 rows you might end up with

    123 456 789

    789 123 456

    456 789 123

    148 256 379

    379 148 256

    256 379 148

    A generalization of this would be to use https://en.wikipedia.org/wiki/Latin_square of size 3N for integer N. If the number of elements in the row is a multiple of 3N chop them up into 3N pieces and use the Latin Square to produce 3N rows. For N=1 this is pretty much the same thing as there is really only one Latin Square of side 3, but for N=2 there are 9408 genuinely different Latin Squares of side 6, so picking one at random actually means something. Note also in the same article ref(4) tells you how to generate Latin Squares at random. I haven't looked at this, but perhaps there are ideas in this you could use.