Search code examples
pythonpandaspercentile

Python percentile rank of a column, grouped by multiple other columns


I would like to group a pandas dataframe by multiple fields ('date' and 'category'), and for each group, rank values of another field ('value') by percentile, while retaining the original ('value') field.

I have attempted:

df2 = df.groupby(['date', 'category'])['value'].rank(pct=True)

But this returns only percentiles for the 'value' field.


Solution

  • I believe you need assign Series to new column:

    df = pd.DataFrame({
             'value':[1,3,5,7,1,0],
             'category':[5] * 6,
             'date':list('aaabbb')
    })
    
    
    df['new'] = df.groupby(['date', 'category'])['value'].rank(pct=True)
    print (df)
       value  category date       new
    0      1         5    a  0.333333
    1      3         5    a  0.666667
    2      5         5    a  1.000000
    3      7         5    b  1.000000
    4      1         5    b  0.666667
    5      0         5    b  0.333333
    

    Alternative with DataFrame.assign:

    df = df.assign(new= df.groupby(['date', 'category'])['value'].rank(pct=True))