python pandas dataframe aggregate-functions

How to use column value as parameter in aggregation function in pandas?

Given a certain table of type

A	B	C
t	r	1
t	r	1
n	j	2
n	j	2
n	j	2

I would like to group on A and B and only take the number of rows specified by C

So the desired output would be

A	B	C
t	r	1
n	j	2
n	j	2

I am trying to achieve that through this function but with no luck

df.groupby(['A', 'B']).agg(lambda x: x.head(df.C))

Solution

You can use groupby.cumcount and boolean indexing:

out = df[df['C'].gt(df.groupby(['A', 'B']).cumcount())]

Or with a classical groupby.apply:

(df.groupby(['A', 'B'], sort=False, as_index=False, group_keys=False)
   .apply(lambda g: g.head(g['C'].iloc[0]))
)

output:

   A  B  C
0  t  r  1
2  n  j  2
3  n  j  2

Intermediates for the groupby.cumcount approach:

   A  B  C  cumcount  C > cumcount
0  t  r  1         0          True
1  t  r  1         1         False
2  n  j  2         0          True
3  n  j  2         1          True
4  n  j  2         2         False