Search code examples
pythonpandasdataframedate-rangeshift

how to get the shifted index value of a dataframe in Pandas?


Consider the simple example below:

date = pd.date_range('1/1/2011', periods=5, freq='H')

df = pd.DataFrame({'cat' : ['A', 'A', 'A', 'B',
                         'B']}, index = date)
df
Out[278]: 
                    cat
2011-01-01 00:00:00   A
2011-01-01 01:00:00   A
2011-01-01 02:00:00   A
2011-01-01 03:00:00   B
2011-01-01 04:00:00   B

I want to create a variable that contains the lagged/lead value of the index. That is something like:

df['index_shifted']=df.index.shift(1)

So, for instance, at time 2011-01-01 01:00:00 I expect the variable index_shifted to be 2011-01-01 00:00:00

How can I do that? Thanks!


Solution

  • I think you need Index.shift with -1:

    df['index_shifted']= df.index.shift(-1)
    print (df)
                        cat       index_shifted
    2011-01-01 00:00:00   A 2010-12-31 23:00:00
    2011-01-01 01:00:00   A 2011-01-01 00:00:00
    2011-01-01 02:00:00   A 2011-01-01 01:00:00
    2011-01-01 03:00:00   B 2011-01-01 02:00:00
    2011-01-01 04:00:00   B 2011-01-01 03:00:00
    

    For me it works without freq, but maybe it is necessary in real data:

    df['index_shifted']= df.index.shift(-1, freq='H')
    print (df)
                        cat       index_shifted
    2011-01-01 00:00:00   A 2010-12-31 23:00:00
    2011-01-01 01:00:00   A 2011-01-01 00:00:00
    2011-01-01 02:00:00   A 2011-01-01 01:00:00
    2011-01-01 03:00:00   B 2011-01-01 02:00:00
    2011-01-01 04:00:00   B 2011-01-01 03:00:00
    

    EDIT:

    If freq of DatetimeIndex is None, you need add freq to shift:

    import pandas as pd
    
    date = pd.date_range('1/1/2011', periods=5, freq='H').union(pd.date_range('5/1/2011', periods=5, freq='H'))
    
    
    df = pd.DataFrame({'cat' : ['A', 'A', 'A', 'B',
                             'B','A', 'A', 'A', 'B',
                             'B']}, index = date)
    
    print (df.index)
    DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
                   '2011-01-01 02:00:00', '2011-01-01 03:00:00',
                   '2011-01-01 04:00:00', '2011-05-01 00:00:00',
                   '2011-05-01 01:00:00', '2011-05-01 02:00:00',
                   '2011-05-01 03:00:00', '2011-05-01 04:00:00'],
                  dtype='datetime64[ns]', freq=None)
    
    df['index_shifted']= df.index.shift(-1, freq='H')
    print (df)
                        cat       index_shifted
    2011-01-01 00:00:00   A 2010-12-31 23:00:00
    2011-01-01 01:00:00   A 2011-01-01 00:00:00
    2011-01-01 02:00:00   A 2011-01-01 01:00:00
    2011-01-01 03:00:00   B 2011-01-01 02:00:00
    2011-01-01 04:00:00   B 2011-01-01 03:00:00
    2011-05-01 00:00:00   A 2011-04-30 23:00:00
    2011-05-01 01:00:00   A 2011-05-01 00:00:00
    2011-05-01 02:00:00   A 2011-05-01 01:00:00
    2011-05-01 03:00:00   B 2011-05-01 02:00:00
    2011-05-01 04:00:00   B 2011-05-01 03:00:00