Search code examples
pythonpandasdataframedata-sciencedata-munging

pandas get percentile of value withing


I have a dataframe:

d = [f1  f2  f3 
     1    2   3 
     5    1   2 
     3    3   1 
     2    4   7
     ..  ..  ..]

I want to add, per feature, the percentile of the value for this feature in the row (for subset of features). So for subset = [f1,f2,f3] my dataframe will be

new_d =[f1   f2   f3  f1_per   f2_per   f3_per
         1    2   3    0         0.25     0.5
         5    1   2    1          0       0.25
         3    3   1    0.5        0.5     0
         2    4   5    0.25      0.75     1
         4    5   4    0.75       1       0.75]

What is the best way to do so?


Solution

  • In ouput are 5 rows, in input are 4 rows, so output is different is use DataFrame.rank with all columns and join back to original

    df = df.join(df.rank(pct=True).add_prefix('pct'))
    print (df)
      f1  f2  f3  pctf1  pctf2  pctf3
    0   1   2   3    0.2    0.4    0.6
    1   5   1   2    1.0    0.2    0.4
    2   3   3   1    0.6    0.6    0.2
    3   2   4   7    0.4    0.8    1.0
    4   4   5   4    0.8    1.0    0.8
    

    If need rank with percentile by number of rows without 1:

    df = df.join(df.rank().sub(1).div(len(df) - 1).add_prefix('pct'))
    print (df)
       f1  f2  f3  pctf1  pctf2  pctf3
    0   1   2   3   0.00   0.25   0.50
    1   5   1   2   1.00   0.00   0.25
    2   3   3   1   0.50   0.50   0.00
    3   2   4   7   0.25   0.75   1.00
    4   4   5   4   0.75   1.00   0.75