I have a df and would like to shift data in a specific column backwards based on the name and datetime index by the previous adjacent dates (not necessarily fixed length of days), say the due to data collection error resulting in the scores being recorded on the wrong date.
For example, take the original df:
Date | Name | Score |
---|---|---|
2020-01-01 | John | 9 |
2020-01-01 | James | 8 |
2020-03-05 | John | 6 |
2020-03-05 | James | 7 |
2020-07-20 | John | 5 |
2020-07-20 | James | 4 |
And the corrected df after shifting ['Score'] back by one adjacent date would look like:
Date | Name | Score |
---|---|---|
2020-01-01 | John | 6 |
2020-01-01 | James | 7 |
2020-03-05 | John | 5 |
2020-03-05 | James | 4 |
2020-07-20 | John | NA |
2020-07-20 | James | NA |
Seems like there is a straightforward solution to this, but I tried making a copy of the data frame and using += 1 but ran into trouble due to dates being in datetime format.
Thank you so much in advanced!
Assuming the dates are sorted. Use groupby
+ shift
:
df['prev_Score'] = df.groupby('Name')['Score'].shift(-1)
NB. as a new column here for clarity.
Output:
Date Name Score prev_Score
0 2020-01-01 John 9 6.0
1 2020-01-01 James 8 7.0
2 2020-03-05 John 6 5.0
3 2020-03-05 James 7 4.0
4 2020-07-20 John 5 NaN
5 2020-07-20 James 4 NaN