I want to perform an A/B experiment with 3 buckets. If I had 2 buckets, I could get 2 sets of users from all the users by using the method random.sample
from random import sample
test = sample(all_users, k=100)
control = set(all_users) - set(test)
Since I need 3 sets of users, will the following code ensure that every user has equal chances of being in either variant?
NUM_USERS = int(len(all_users) * 0.33)
variant1 = sample(all_users, NUM_USERS)
variant2 = sample(set(all_users) - set(variant1), NUM_USERS)
variant3 = set(all_users) - variant1 - variant2
I think it would, but because it's code you can test it empirically pretty easily. For example, you could wrap the code up into a function and call it a few thousand times and make sure that each entry has the same probability of ending up in either variant.
Another way of expressing this would be just to shuffle the elements and then pick subsets. For example, something like:
from random import shuffle
# shuffle works in-place, making a copy means we don't change the callers version
elems = list(all_users)
shuffle(elems)
# round up to nearest group size
n = (len(elems) + 2) // 3
# assign users to groups
g1, g2, g3 = (
elems[i:i+n]
for i in range(0, len(elems), n)
)