Search code examples
pythonpandasdatelagshift

Simple way to shift/lag data in certain columns in df by n adjacent dates


I have a df and would like to shift data in a specific column backwards based on the name and datetime index by the previous adjacent dates (not necessarily fixed length of days), say the due to data collection error resulting in the scores being recorded on the wrong date.

For example, take the original df:

Date Name Score
2020-01-01 John 9
2020-01-01 James 8
2020-03-05 John 6
2020-03-05 James 7
2020-07-20 John 5
2020-07-20 James 4

And the corrected df after shifting ['Score'] back by one adjacent date would look like:

Date Name Score
2020-01-01 John 6
2020-01-01 James 7
2020-03-05 John 5
2020-03-05 James 4
2020-07-20 John NA
2020-07-20 James NA

Seems like there is a straightforward solution to this, but I tried making a copy of the data frame and using += 1 but ran into trouble due to dates being in datetime format.

Thank you so much in advanced!


Solution

  • Assuming the dates are sorted. Use groupby + shift:

    df['prev_Score'] = df.groupby('Name')['Score'].shift(-1)
    

    NB. as a new column here for clarity.

    Output:

             Date   Name  Score  prev_Score
    0  2020-01-01   John      9         6.0
    1  2020-01-01  James      8         7.0
    2  2020-03-05   John      6         5.0
    3  2020-03-05  James      7         4.0
    4  2020-07-20   John      5         NaN
    5  2020-07-20  James      4         NaN