I'm basically trying to do this Monte Carlo kind of analysis where I randomly reassign the participants in my experiment to new groups, and then reanalyze the data given the random new groups. So here's what I want to do:
Participants are originally grouped into eight groups of four participants each. I want to randomly reassign each participant to a new group, but I don't want any participants to end up in a new group with another participant from their same original group.
Here is how far I got with this:
import random
import pandas as pd
import itertools as it
data = list(it.product(range(8),range(4)))
test_df = pd.DataFrame(data=data,columns=['group','partid'])
test_df['new_group'] = None
for idx, row in test_df.iterrows():
start_group = row['group']
takens = test_df.query('group == @start_group')['new_group'].values
fulls = test_df.groupby('new_group').count().query('partid >= 4').index.values
possibles = [x for x in test_df['group'].unique() if (x not in takens)
and (x not in fulls)]
test_df.loc[idx,'new_group'] = random.choice(possibles)
The basic idea here is that I randomly reassign a participant to a new group with the constraints that (a) the new group doesn't have one of their original group partners in, and (b) the new group doesn't have 4 or more participants already reassigned to it.
The problem with this approach is that, many times, by the time we try to reassign the last group, the only remaining group slots are in that same group. I could also just try to re-randomize when it fails until it succeeds, but that feels silly. Also, I want to make 100 random reassignments, so that approach could get very slow....
So there must be a smarter way to do this. I also feel like there should be a simpler way to solve this, given how simple the goal feels (but I realize that can be misleading...)
After sleeping on it I've found a significantly better solution that's in ~ Big O of numGroups
.
import random
import numpy as np
import pandas as pd
import itertools as it
np.random.seed(0)
numGroups=4
numMembers=4
data = list(it.product(range(numGroups),range(numMembers)))
df = pd.DataFrame(data=data,columns=['group','partid'])
g = np.repeat(range(numGroups),numMembers).reshape((numGroups,numMembers))
In [95]: g
Out[95]:
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])
g = np.random.permutation(g)
In [102]: g
Out[102]:
array([[2, 2, 2, 2],
[3, 3, 3, 3],
[1, 1, 1, 1],
[0, 0, 0, 0]])
g = np.tile(g,(2,1))
In [104]: g
Out[104]:
array([[2, 2, 2, 2],
[3, 3, 3, 3],
[1, 1, 1, 1],
[0, 0, 0, 0],
[2, 2, 2, 2],
[3, 3, 3, 3],
[1, 1, 1, 1],
[0, 0, 0, 0]])
Notice the diagonals.
array([[2, -, -, -],
[3, 3, -, -],
[1, 1, 1, -],
[0, 0, 0, 0],
[-, 2, 2, 2],
[-, -, 3, 3],
[-, -, -, 1],
[-, -, -, -]])
Take the diagonals from top to bottom.
newGroups = []
for i in range(numGroups):
newGroups.append(np.diagonal(g[i:i+numMembers]))
In [106]: newGroups
Out[106]:
[array([2, 3, 1, 0]),
array([3, 1, 0, 2]),
array([1, 0, 2, 3]),
array([0, 2, 3, 1])]
newGroups = np.ravel(newGroups)
df["newGroups"] = newGroups
In [110]: df
Out[110]:
group partid newGroups
0 0 0 2
1 0 1 3
2 0 2 1
3 0 3 0
4 1 0 3
5 1 1 1
6 1 2 0
7 1 3 2
8 2 0 1
9 2 1 0
10 2 2 2
11 2 3 3
12 3 0 0
13 3 1 2
14 3 2 3
15 3 3 1
Turned out to be a lot harder than I thought...
I have a brute force method that basically guesses different permutations of groups until it finally gets one where everyone ends up in a different group. The benefit of this approach vs. what you've shown is that it doesn't suffer from "running out of groups at the end".
It can potentially get slow - but for 8 groups and 4 members per group it's fast.
import random
import numpy as np
import pandas as pd
import itertools as it
random.seed(0)
numGroups=4
numMembers=4
data = list(it.product(range(numGroups),range(numMembers)))
df = pd.DataFrame(data=data,columns=['group','partid'])
g = np.repeat(range(numGroups),numMembers).reshape((numGroups,numMembers))
In [4]: g
Out[4]:
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])
def reArrange(g):
g = np.transpose(g)
g = [np.random.permutation(x) for x in g]
return np.transpose(g)
# check to see if any members in each old group have duplicate new groups
# if so repeat
while np.any(np.apply_along_axis(lambda x: len(np.unique(x))<numMembers,1,g)):
g = reArrange(g)
df["newGroup"] = g.ravel()
In [7]: df
Out[7]:
group partid newGroup
0 0 0 2
1 0 1 3
2 0 2 1
3 0 3 0
4 1 0 0
5 1 1 1
6 1 2 2
7 1 3 3
8 2 0 1
9 2 1 0
10 2 2 3
11 2 3 2
12 3 0 3
13 3 1 2
14 3 2 0
15 3 3 1