Search code examples
pythonpython-3.xrandomsample

Generating multiple test-control variants randomly for A/B testing in Python


I want to perform an A/B experiment with 3 buckets. If I had 2 buckets, I could get 2 sets of users from all the users by using the method random.sample

from random import sample

test = sample(all_users, k=100)
control = set(all_users) - set(test)

Since I need 3 sets of users, will the following code ensure that every user has equal chances of being in either variant?

NUM_USERS = int(len(all_users) * 0.33)

variant1 = sample(all_users, NUM_USERS)
variant2 = sample(set(all_users) - set(variant1), NUM_USERS) 
variant3 = set(all_users) - variant1 - variant2

Solution

  • I think it would, but because it's code you can test it empirically pretty easily. For example, you could wrap the code up into a function and call it a few thousand times and make sure that each entry has the same probability of ending up in either variant.

    Another way of expressing this would be just to shuffle the elements and then pick subsets. For example, something like:

    from random import shuffle
    
    # shuffle works in-place, making a copy means we don't change the callers version
    elems = list(all_users)
    shuffle(elems)
    
    # round up to nearest group size
    n = (len(elems) + 2) // 3
    
    # assign users to groups
    g1, g2, g3 = (
      elems[i:i+n]
      for i in range(0, len(elems), n)
    )