python pandas pandas-groupby median rolling-computation

Calculating moving median within group

I want to perform rolling median on price column over 4 days back, data will be groupped by date. So basically I want to take prices for a given day and all prices for 4 days back and calculate median out of these values.

Here are the sample data:

id      date        price
1637027 2020-01-21  7045204.0
280955  2020-01-11  3590000.0
782078  2020-01-28  2600000.0
1921717 2020-02-17  5500000.0
1280579 2020-01-23  869000.0
2113506 2020-01-23  628869.0
580638  2020-01-25  650000.0
1843598 2020-02-29  969000.0
2300960 2020-01-24  5401530.0
1921380 2020-02-19  1220000.0
853202  2020-02-02  2990000.0
1024595 2020-01-27  3300000.0
565202  2020-01-25  3540000.0
703824  2020-01-18  3990000.0
426016  2020-01-26  830000.0

I got close with combining rolling and groupby:

df.groupby('date').rolling(window = 4, on = 'date')['price'].median()

But this seems to add one row per each index value and by median definition, I am not able to somehow merge these rows to produce one result per row.

Result now looks like this:

date        date      
2020-01-10  2020-01-10          NaN
            2020-01-10          NaN
            2020-01-10          NaN
            2020-01-10    3070000.0
            2020-01-10    4890000.0
                            ...    
2020-03-11  2020-03-11    4290000.0
            2020-03-11    3745000.0
            2020-03-11    3149500.0
            2020-03-11    3149500.0
            2020-03-11    3149500.0
Name: price, Length: 389716, dtype: float64

It seems it just deleted 3 first values and then just printed price value.

Is it possible to get one lagged / moving median value per one date?

Solution

You can use rolling with a frequency window of 5 days to get today and last 4 days, then drop_duplicates to keep the last row per day. First create a copy (if you want to keep the original one), sort_values per date and ensure the date column is datetime

#sort and change to datetime
df_f = df[['date','price']].copy().sort_values('date')
df_f['date'] = pd.to_datetime(df_f['date'])

#create the column rolling
df_f['price'] = df_f.rolling('5D', on='date')['price'].median()

#drop_duplicates and keep the last row per day
df_f = df_f.drop_duplicates(['date'], keep='last').reset_index(drop=True)

print (df_f)

         date      price
0  2020-01-11  3590000.0
1  2020-01-18  3990000.0
2  2020-01-21  5517602.0
3  2020-01-23   869000.0
4  2020-01-24  3135265.0
5  2020-01-25  2204500.0
6  2020-01-26   849500.0
7  2020-01-27   869000.0
8  2020-01-28  2950000.0
9  2020-02-02  2990000.0
10 2020-02-17  5500000.0
11 2020-02-19  3360000.0
12 2020-02-29   969000.0