Search code examples
pythonpandasdataframepercentile

How can I get percentile of column in dataframe considering only previous values? (Python)


I have a dataframe with a numeric column and I would link to calculate the percentile of the values in each row for that column considering only previous rows of the column. Here is an example:

+-------+
| col_1 |
+-------+
|  5    |
+-------+
|  4    |
+-------+
|  10   |
+-------+
|  1    |
+-------+
|  7    |
+-------+

I would like to obtain a dataframe like this:

+-------+------------+
| col_1 | percentile |
+-------+------------+
|  5    |     100    |
+-------+------------+
|  4    |     50     |
+-------+------------+
|  10   |     100    |
+-------+------------+
|  1    |     25     |
+-------+------------+
|  7    |     80     |
+-------+------------+

How can I calculate it?


Solution

  • Try as follows.

    • Use df.expanding with min_periods=1 to allow expanding window calculations.
    • For each window, we apply Expanding.rank with pct=True (and we multiply by 100).

    We can assign the result directly to the new column percentile:

    import pandas as pd
    
    data = {'col_1':[5,4,10,1,7]}
    df = pd.DataFrame(data)
    
    df['percentile'] = df['col_1'].expanding(min_periods=1).rank(pct=True).mul(100)
    
    print(df)
    
       col_1  percentile
    0      5       100.0
    1      4        50.0
    2     10       100.0
    3      1        25.0
    4      7        80.0
    

    Update: Expanding.rank was added to pandas in version 1.4.0. For earlier versions, you could for instance try:

    temp = df['col_1'].expanding(min_periods=1).agg(['rank','count'])
    df['percentile'] = (temp['rank']/temp['count']).mul(100)
    
    print(df)
    
       col_1  percentile
    0      5       100.0
    1      4        50.0
    2     10       100.0
    3      1        25.0
    4      7        80.0
    

    Or, as a one-liner:

    df['percentile'] = df['col_1'].expanding(min_periods=1)\
        .apply(lambda x: (x.rank()/x.count()).to_numpy()[-1]*100)