I have the following pandas dataframe
df = pd.DataFrame(data = {
'item': ['red','red','red','blue','blue'],
'dt': pd.to_datetime(['2018-01-31', '2018-02-28', '2018-03-31', '2018-01-31', '2018-03-31']),
's': [3.2, 4.8, 5.1, 5.3, 5.8],
'r': [1,2,3,4,5],
't': [7,8,9,10,11],
})
which looks like
item dt s r t
0 red 2018-01-31 3.2 1 7
1 red 2018-02-28 4.8 2 8
2 red 2018-03-31 5.1 3 9
3 blue 2018-01-31 5.3 4 10
4 blue 2018-03-31 5.8 5 11
Note that the time points are irregular: "blue" is missing February data. All dates are valid end-of-month dates.
I'd like to add a column which is the "s value from two months ago", ideally something like
df['s_lag2m'] = df.set_index(['item','dt'])['s'].shift(2, 'M')
and I would get
item dt s r t s_lag2m
0 red 2018-01-31 3.2 1 7 NaN
1 red 2018-02-28 4.8 2 8 NaN
2 red 2018-03-31 5.1 3 9 3.2
3 blue 2018-01-31 5.3 4 10 NaN
4 blue 2018-03-31 5.8 5 11 5.3
But that doesn't work; it throws NotImplementedError: Not supported for type MultiIndex
.
How can I do this?
We can do reindex
after set_index
with only dt
df['New']=df.set_index(['dt']).groupby('item')['s'].shift(2, 'M').\
reindex(pd.MultiIndex.from_frame(df[['item','dt']])).values
df
item dt s r t New
0 red 2018-01-31 3.2 1 7 NaN
1 red 2018-02-28 4.8 2 8 NaN
2 red 2018-03-31 5.1 3 9 3.2
3 blue 2018-01-31 5.3 4 10 NaN
4 blue 2018-03-31 5.8 5 11 5.3