Search code examples
pythonpandasdataframerandomsampling

Disproportionate stratified sampling in Pandas


How can I randomly select one row from each group (column Name) in the following dataframe:

   Distance   Name  Time  Order
1        16   John     5      0
4        31   John     9      1
0        23   Kate     3      0
3        15   Kate     7      1
2        32  Peter     2      0
5        26  Peter     4      1

Expected result:

Distance   Name  Time  Order

4        31   John     9      1
0        23   Kate     3      0
2        32  Peter     2      0

Solution

  • you can use a groupby on Name col and apply sample

    df.groupby('Name',as_index=False).apply(lambda x:x.sample()).reset_index(drop=True)
    

        Distance   Name  Time  Order
    0        31   John     9      1
    1        15   Kate     7      1
    2        32  Peter     2      0