I have a dataframe sorted in descending order date that records the Rank of students in class and the predicted score.
Date Student_ID Rank Predicted_Score
4/7/2021 33 2 87
13/6/2021 33 4 88
31/3/2021 33 7 88
28/2/2021 33 2 86
14/2/2021 33 10 86
31/1/2021 33 8 86
23/12/2020 33 1 81
8/11/2020 33 3 80
21/10/2020 33 3 80
23/9/2020 33 4 80
20/5/2020 33 3 80
29/4/2020 33 4 80
15/4/2020 33 2 79
26/2/2020 33 3 79
12/2/2020 33 5 79
29/1/2020 33 1 70
I want to create a column called Recent_Predicted_Score
that record the last predicted_score where that student actually ranks top 3. So the desired outcome looks like
Date Student_ID Rank Predicted_Score Recent_Predicted_Score
4/7/2021 33 2 87 86
13/6/2021 33 4 88 86
31/3/2021 33 7 88 86
28/2/2021 33 2 86 81
14/2/2021 33 10 86 81
31/1/2021 33 8 86 81
23/12/2020 33 1 81 80
8/11/2020 33 3 80 80
21/10/2020 33 3 80 80
23/9/2020 33 4 80 80
20/5/2020 33 3 80 79
29/4/2020 33 4 80 79
15/4/2020 33 2 79 79
26/2/2020 33 3 79 70
12/2/2020 33 5 79 70
29/1/2020 33 1 70
Here's what I have tried but it doesn't quite work, not sure if I am on the right track:
df.sort_values(by = ['Student_ID', 'Date'], ascending = [True, False], inplace = True)
lp1 = df['Predicted_Score'].where(df['Rank'].isin([1,2,3])).groupby(df['Student_ID']).bfill()
lp2 = df.groupby(['Student_ID', 'Rank'])['Predicted_Score'].shift(-1)
df = df.assign(Recent_Predicted_Score=lp1.mask(df['Rank'].isin([1,2,3]), lp2))
Thanks in advance.
Try:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.sort_values(['Student_ID', 'Date'])
df['Recent_Predicted_Score'] = np.where(df['Rank'].isin([1, 2, 3]), df['Predicted_Score'], np.nan)
df['Recent_Predicted_Score'] = df.groupby('Student_ID', group_keys=False)['Recent_Predicted_Score'].apply(lambda x: x.ffill().shift().fillna(''))
df = df.sort_values(['Student_ID', 'Date'], ascending = [True, False])
print(df)
Prints:
Date Student_ID Rank Predicted_Score Recent_Predicted_Score
0 2021-07-04 33 2 87 86.0
1 2021-06-13 33 4 88 86.0
2 2021-03-31 33 7 88 86.0
3 2021-02-28 33 2 86 81.0
4 2021-02-14 33 10 86 81.0
5 2021-01-31 33 8 86 81.0
6 2020-12-23 33 1 81 80.0
7 2020-11-08 33 3 80 80.0
8 2020-10-21 33 3 80 80.0
9 2020-09-23 33 4 80 80.0
10 2020-05-20 33 3 80 79.0
11 2020-04-29 33 4 80 79.0
12 2020-04-15 33 2 79 79.0
13 2020-02-26 33 3 79 70.0
14 2020-02-12 33 5 79 70.0
15 2020-01-29 33 1 70