Search code examples
pythonrandompandas

Python: Random selection per group


Say that I have a dataframe that looks like:

Name Group_Id
AAA  1
ABC  1
CCC  2
XYZ  2
DEF  3 
YYH  3

How could I randomly select one (or more) row for each Group_Id? Say that I want one random draw per Group_Id, I would get:

Name Group_Id
AAA  1
XYZ  2
DEF  3

Solution

  • size = 2        # sample size
    replace = True  # with replacement
    fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
    df.groupby('Group_Id', as_index=False).apply(fn)