Search code examples
pythonpandasrandomgroup-by

Sample each group after pandas groupby


How do I sample each group after groupby operation?

import pandas as pd

df = pd.DataFrame({'a': [1,2,3,4,5,6,7],
                   'b': [1,1,1,0,0,0,0]})

grouped = df.groupby('b')

Given the above setup, I want to sample from each group, e.g., I want 30% of each group.


Solution

  • Apply a lambda and call sample with param frac:

    In [2]:
    df = pd.DataFrame({'a': [1,2,3,4,5,6,7],
                       'b': [1,1,1,0,0,0,0]})
    ​
    grouped = df.groupby('b')
    grouped.apply(lambda x: x.sample(frac=0.3))
    
    Out[2]:
         a  b
    b        
    0 6  7  0
    1 2  3  1