In the following example df, what is the best approach to keep:
Score
appears for each id
Score
for each id
and drop duplicated rows until it changesExample df
date id Score
0 2001-09-06 1 3
1 2001-09-07 1 3
2 2001-09-08 1 4
3 2001-09-09 2 6
4 2001-09-10 2 6
5 2001-09-11 1 4
6 2001-09-12 2 5
7 2001-09-13 2 5
8 2001-09-14 1 3
Desired df
date id Score
0 2001-09-06 1 3
1 2001-09-08 1 4
2 2001-09-09 2 6
3 2001-09-12 2 5
4 2001-09-14 1 3
Use groupby
with diff
:
print (df[df.groupby("id")["Score"].diff()!=0])
date id Score
0 2001-09-06 1 3
2 2001-09-08 1 4
3 2001-09-09 2 6
6 2001-09-12 2 5
8 2001-09-14 1 3
The first appearance will always result in NaN
which !=0.