Search code examples
pythonpandastime-serieslagshift

pandas lagged value of time series (previous year) for cohort


For a dataframe of:

import pandas as pd

df = pd.DataFrame({
    'dt':[
        '2019-01-01',
        '2019-01-02',
        '2019-01-03',
        '2020-01-01',
        '2020-01-02',
        '2020-01-03',
        '2019-01-01',
        '2019-01-02',
        '2019-01-03',
        '2020-01-01',
        '2020-01-02',
        '2020-01-03'
    ],
    'foo': [1,2,3, 4,5,6, 1,5,3, 4,10,6],

    'category': [1,1,1,1,1,1,  2,2,2,2,2,2]
})

How can I find the lagged value from the previous year for each category?

df['dt'] = pd.to_datetime(df['dt'])
display(df)

Shifting the index only returns an empty result, and thus fails assignment.

df['last_year'] = df[df.dt == df.dt - pd.offsets.Day(365)]

Obviously, a join with the data from 2019 on the month and day would work - but seem rahter cumbersome. Is there a better way?

edit

the desired result:

dt  foo     category  last_year
2020-01-01  4   1      1
2020-01-02  5   1      2
2020-01-03  6   1      3
2020-01-01  4   2      1
2020-01-02  10  2      5
2020-01-03  6   2      3

Solution

  • you can merge df with itself after you assign the column dt with the difference you want with pd.DateOffset.

    print (df.merge(df.assign(dt=lambda x: x['dt']+pd.DateOffset(years=1)), 
                    on=['dt', 'category'], 
                    suffixes=('','_lastYear'), 
                    how='left'))
               dt  foo  category  foo_lastYear
    0  2019-01-01    1         1           NaN
    1  2019-01-02    2         1           NaN
    2  2019-01-03    3         1           NaN
    3  2020-01-01    4         1           1.0
    4  2020-01-02    5         1           2.0
    5  2020-01-03    6         1           3.0
    6  2019-01-01    1         2           NaN
    7  2019-01-02    5         2           NaN
    8  2019-01-03    3         2           NaN
    9  2020-01-01    4         2           1.0
    10 2020-01-02   10         2           5.0
    11 2020-01-03    6         2           3.0