Search code examples
pythonpandasrandomdata-scienceexperimental-design

Randomly reassign participants to groups such that participants originally from same group don't end up in same group


I'm basically trying to do this Monte Carlo kind of analysis where I randomly reassign the participants in my experiment to new groups, and then reanalyze the data given the random new groups. So here's what I want to do:

Participants are originally grouped into eight groups of four participants each. I want to randomly reassign each participant to a new group, but I don't want any participants to end up in a new group with another participant from their same original group.

Here is how far I got with this:

import random
import pandas as pd
import itertools as it

data = list(it.product(range(8),range(4)))
test_df = pd.DataFrame(data=data,columns=['group','partid'])
test_df['new_group'] = None

for idx, row in test_df.iterrows():
    start_group = row['group']
    takens      = test_df.query('group == @start_group')['new_group'].values
    fulls       = test_df.groupby('new_group').count().query('partid >= 4').index.values
    possibles   = [x for x in test_df['group'].unique() if (x not in takens)
                                                      and (x not in fulls)]
    test_df.loc[idx,'new_group'] = random.choice(possibles)

The basic idea here is that I randomly reassign a participant to a new group with the constraints that (a) the new group doesn't have one of their original group partners in, and (b) the new group doesn't have 4 or more participants already reassigned to it.

The problem with this approach is that, many times, by the time we try to reassign the last group, the only remaining group slots are in that same group. I could also just try to re-randomize when it fails until it succeeds, but that feels silly. Also, I want to make 100 random reassignments, so that approach could get very slow....

So there must be a smarter way to do this. I also feel like there should be a simpler way to solve this, given how simple the goal feels (but I realize that can be misleading...)


Solution

  • EDIT: Better solution


    After sleeping on it I've found a significantly better solution that's in ~ Big O of numGroups.

    Sample data

    import random
    import numpy as np
    import pandas as pd
    import itertools as it
    
    np.random.seed(0)
    numGroups=4
    numMembers=4
    
    data = list(it.product(range(numGroups),range(numMembers)))
    df = pd.DataFrame(data=data,columns=['group','partid'])
    

    Solution

    g = np.repeat(range(numGroups),numMembers).reshape((numGroups,numMembers))
    In [95]: g
    Out[95]: 
    array([[0, 0, 0, 0],
           [1, 1, 1, 1],
           [2, 2, 2, 2],
           [3, 3, 3, 3]])
    
    g = np.random.permutation(g)
    In [102]: g
    Out[102]: 
    array([[2, 2, 2, 2],
           [3, 3, 3, 3],
           [1, 1, 1, 1],
           [0, 0, 0, 0]])
    
    g = np.tile(g,(2,1))
    In [104]: g
    Out[104]: 
    array([[2, 2, 2, 2],
           [3, 3, 3, 3],
           [1, 1, 1, 1],
           [0, 0, 0, 0],
           [2, 2, 2, 2],
           [3, 3, 3, 3],
           [1, 1, 1, 1],
           [0, 0, 0, 0]])
    

    Notice the diagonals.

    array([[2, -, -, -],
           [3, 3, -, -],
           [1, 1, 1, -],
           [0, 0, 0, 0],
           [-, 2, 2, 2],
           [-, -, 3, 3],
           [-, -, -, 1],
           [-, -, -, -]])
    

    Take the diagonals from top to bottom.

    newGroups = []
    for i in range(numGroups):
        newGroups.append(np.diagonal(g[i:i+numMembers]))
    
    In [106]: newGroups
    Out[106]: 
    [array([2, 3, 1, 0]),
     array([3, 1, 0, 2]),
     array([1, 0, 2, 3]),
     array([0, 2, 3, 1])]
    
    newGroups = np.ravel(newGroups)
    df["newGroups"] = newGroups
    
    In [110]: df
    Out[110]: 
        group  partid  newGroups
    0       0       0          2
    1       0       1          3
    2       0       2          1
    3       0       3          0
    4       1       0          3
    5       1       1          1
    6       1       2          0
    7       1       3          2
    8       2       0          1
    9       2       1          0
    10      2       2          2
    11      2       3          3
    12      3       0          0
    13      3       1          2
    14      3       2          3
    15      3       3          1
    

    Old Solution: Brute Force Method


    Turned out to be a lot harder than I thought...

    I have a brute force method that basically guesses different permutations of groups until it finally gets one where everyone ends up in a different group. The benefit of this approach vs. what you've shown is that it doesn't suffer from "running out of groups at the end".

    It can potentially get slow - but for 8 groups and 4 members per group it's fast.

    Sample data

    import random
    import numpy as np
    import pandas as pd
    import itertools as it
    
    random.seed(0)
    numGroups=4
    numMembers=4
    
    data = list(it.product(range(numGroups),range(numMembers)))
    df = pd.DataFrame(data=data,columns=['group','partid'])
    

    Solution

    g = np.repeat(range(numGroups),numMembers).reshape((numGroups,numMembers))
    
    In [4]: g
    Out[4]: 
    array([[0, 0, 0, 0],
           [1, 1, 1, 1],
           [2, 2, 2, 2],
           [3, 3, 3, 3]])
    
    def reArrange(g):
        g = np.transpose(g)
        g = [np.random.permutation(x) for x in g]
        return np.transpose(g)
    
    # check to see if any members in each old group have duplicate new groups
    # if so repeat
    while np.any(np.apply_along_axis(lambda x: len(np.unique(x))<numMembers,1,g)):
        g = reArrange(g)
    
    df["newGroup"] = g.ravel()
    
    In [7]: df
    Out[7]: 
        group  partid  newGroup
    0       0       0         2
    1       0       1         3
    2       0       2         1
    3       0       3         0
    4       1       0         0
    5       1       1         1
    6       1       2         2
    7       1       3         3
    8       2       0         1
    9       2       1         0
    10      2       2         3
    11      2       3         2
    12      3       0         3
    13      3       1         2
    14      3       2         0
    15      3       3         1