Search code examples
pandaslag

How to do a calendar day lag in pandas


I am trying to do a calendar lag to a time series data which i am trying to analyse. • For x month lags, return the value x number of months ago. (3 month lag from 17th May would be data on the 17th Feb).

Is there any solution available directly in pandas.

df_list = pd.date_range(start=min_date, end=max_date,freq='D').to_frame(index=False)
df_list.columns = ['name']
df_list.set_index('name',inplace = True)
df = df_list.reindex(df_list.index - pd.to_timedelta(30, unit='d'), 
method='nearest')

This code is not able to return correctly for 01-march with 1 month lag which should ideally be 1-Feb.

And Mar 31, 30, 29, 28 should be Feb 28 (all using a 1-month lag).


Solution

  • This is pd.offsets.DateOffset(months=1). It even gets leap years correct. It makes use of dateutil.relavitedelta logic, which does the date subtraction as you need.

    import pandas as pd
    dates = pd.to_datetime(['2018-03-01', '2018-03-31', '2018-03-30', 
                            '2018-03-29', '2018-03-28', '2016-03-31'])
    df = pd.DataFrame({'dates': dates})
    #       dates
    #0 2018-03-01
    #1 2018-03-31
    #2 2018-03-30
    #3 2018-03-29
    #4 2018-03-28
    #5 2016-03-31
    
    df.dates - pd.offsets.DateOffset(months=1)
    #0   2018-02-01
    #1   2018-02-28
    #2   2018-02-28
    #3   2018-02-28
    #4   2018-02-28
    #5   2016-02-29
    #Name: dates, dtype: datetime64[ns]