Search code examples
pythonpandasdataframepandas-groupbydata-processing

Rolling quantiles over a column in pandas


I have a table as such

+------+------------+-------+
| Idx  |    date    | value |
+------+------------+-------+
| A    | 20/11/2016 |    10 | 
| A    | 21/11/2016 |    8  | 
| A    | 22/11/2016 |    12 |
| B    | 20/11/2016 |    16 |
| B    | 21/11/2016 |    18 |
| B    | 22/11/2016 |    11 |
+------+------------+-------+

I'd like to create a column that creates a new column 'rolling_quantile_value' based on the column 'value' that calculates a quantile based on the past for each row and each possible Idx.

For the example above, if the quantile chosen is median, the output should look like this :

+------+------------+-------+-----------------------+
| Idx  |    date    | value | rolling_median_value  |
+------+------------+-------+-----------------------+
| A    | 20/11/2016 |    10 |      NaN              |
| A    | 21/11/2016 |    8  |      10               |
| A    | 22/11/2016 |    12 |      9                |
| A    | 23/11/2016 |    14 |      10               |
| B    | 20/11/2016 |    16 |      NaN              |
| B    | 21/11/2016 |    18 |      16               |
| B    | 22/11/2016 |    11 |      17               |
+------+------------+-------+-----------------------+

I've done it the naive way where I just put a function that creates row by row based on precedents rows of value and flags the jump from one Id to another but I'm sure that it's not the most efficient way to do that, nor the most elegant.

Looking forward to your suggestions !


Solution

  • I think you want expanding

    df['rolling_median_value']=(df.groupby('Idx',sort=False)
                                  .expanding(1)['value']
                                  .median()
                                  .groupby(level=0)
                                  .shift()
                                  .reset_index(drop=True))
    print(df)
      Idx        date  value  rolling_median_value
    0   A  20/11/2016     10                   NaN
    1   A  21/11/2016      8                  10.0
    2   A  22/11/2016     12                   9.0
    3   A  23/11/2016     14                  10.0
    4   B  20/11/2016     16                   NaN
    5   B  21/11/2016     18                  16.0
    6   B  22/11/2016     11                  17.0
    

    UPDATE

    df['rolling_quantile_value']=(df.groupby('Idx',sort=False)
                                    .expanding(1)['value']
                                    .quantile(0.75)
                                    .groupby(level=0)
                                    .shift()
                                    .reset_index(drop=True))
    print(df)
      Idx        date  value rolling_quantile_value
    0   A  20/11/2016     10                   NaN
    1   A  21/11/2016      8                  10.0
    2   A  22/11/2016     12                   9.5
    3   A  23/11/2016     14                  11.0
    4   B  20/11/2016     16                   NaN
    5   B  21/11/2016     18                  16.0
    6   B  22/11/2016     11                  17.5