Search code examples
pythonpandasvectorizationback-testing

How to vectorize an operation that uses previous values?


I want to do something like this:

df['indicator'] = df.at[x-1] + df.at[x-2]

or

df['indicator'] = df.at[x-1] > df.at[x-2]

I guess edge cases would be taken care of automatically, e.g. skip the first few rows.


Solution

  • This line should give you what you need. The first two rows for your indicator column will be automatically filled with 'NaN'.

    df['indicator'] = df.at.shift(1) + df.at.shift(2)

    For example, if we had the following dataframe:

    a = pd.DataFrame({'date':['2017-06-01','2017-06-02','2017-06-03',
                             '2017-06-04','2017-06-05','2017-06-06'],
                     'count'    :[10,15,17,5,3,7]})
    
    
              date     at
    0   2017-06-01     10
    1   2017-06-02     15
    2   2017-06-03     17
    3   2017-06-04      5
    4   2017-06-05      3
    5   2017-06-06      7
    

    Then running this line will give the below result:

    df['indicator'] = df.at.shift(1) + df.at.shift(2)
    
              date  at   indicator
    0   2017-06-01  10         NaN
    1   2017-06-02  15         NaN
    2   2017-06-03  17        25.0
    3   2017-06-04   5        32.0
    4   2017-06-05   3        22.0
    5   2017-06-06   7         8.0