For a dataframe of:
import pandas as pd
df = pd.DataFrame({
'dt':[
'2019-01-01',
'2019-01-02',
'2019-01-03',
'2020-01-01',
'2020-01-02',
'2020-01-03',
'2019-01-01',
'2019-01-02',
'2019-01-03',
'2020-01-01',
'2020-01-02',
'2020-01-03'
],
'foo': [1,2,3, 4,5,6, 1,5,3, 4,10,6],
'category': [1,1,1,1,1,1, 2,2,2,2,2,2]
})
How can I find the lagged value from the previous year for each category?
df['dt'] = pd.to_datetime(df['dt'])
display(df)
Shifting the index only returns an empty result, and thus fails assignment.
df['last_year'] = df[df.dt == df.dt - pd.offsets.Day(365)]
Obviously, a join with the data from 2019 on the month and day would work - but seem rahter cumbersome. Is there a better way?
the desired result:
dt foo category last_year
2020-01-01 4 1 1
2020-01-02 5 1 2
2020-01-03 6 1 3
2020-01-01 4 2 1
2020-01-02 10 2 5
2020-01-03 6 2 3
you can merge
df with itself after you assign
the column dt with the difference you want with pd.DateOffset
.
print (df.merge(df.assign(dt=lambda x: x['dt']+pd.DateOffset(years=1)),
on=['dt', 'category'],
suffixes=('','_lastYear'),
how='left'))
dt foo category foo_lastYear
0 2019-01-01 1 1 NaN
1 2019-01-02 2 1 NaN
2 2019-01-03 3 1 NaN
3 2020-01-01 4 1 1.0
4 2020-01-02 5 1 2.0
5 2020-01-03 6 1 3.0
6 2019-01-01 1 2 NaN
7 2019-01-02 5 2 NaN
8 2019-01-03 3 2 NaN
9 2020-01-01 4 2 1.0
10 2020-01-02 10 2 5.0
11 2020-01-03 6 2 3.0