I would like to group a pandas dataframe by multiple fields ('date' and 'category'), and for each group, rank values of another field ('value') by percentile, while retaining the original ('value') field.
I have attempted:
df2 = df.groupby(['date', 'category'])['value'].rank(pct=True)
But this returns only percentiles for the 'value' field.
I believe you need assign Series
to new column:
df = pd.DataFrame({
'value':[1,3,5,7,1,0],
'category':[5] * 6,
'date':list('aaabbb')
})
df['new'] = df.groupby(['date', 'category'])['value'].rank(pct=True)
print (df)
value category date new
0 1 5 a 0.333333
1 3 5 a 0.666667
2 5 5 a 1.000000
3 7 5 b 1.000000
4 1 5 b 0.666667
5 0 5 b 0.333333
Alternative with DataFrame.assign
:
df = df.assign(new= df.groupby(['date', 'category'])['value'].rank(pct=True))