Search code examples
pythonpandasdistributionuniform-distribution

Create groups by random assignment in python


I have a dataset with model scores in 3 categories (high, medium and low). The table looks like below:

| Score   |
| ------- |
| high    |
| high    |
| high    |
| low     |
| low     |
| low     |
| medium  |
| medium  |
| medium  |

I want to randomly assign these scores into 4 groups. control, treatment 1, treatment 2, treatment 3. control group should have 20% of the observations and the rest 80% has to be divided into the other 3 equal sized groups. However, i want the distribution of scores (high, medium and low) in each group to be the equal. How can i solve this using python?

PS: This is just a representation of the actual table, but it will have a lot more observations than this.


Solution

  • You can try groupby.transform:

    cats = [ 'control', 'treatment 1', 'treatment 2', 'treatment 3']
    probs = [.2, .8/3, .8/3, .8/3]
    
    
    (df.groupby('Score')['Score']
       .transform(lambda x: np.random.choice(cats, size=len(x), p=probs, replace=True)
    )