There are two columns in the data frame and am trying to remove the consecutive element from column "a" and its corresponding element from column "b" while keeping only the last element.
import pandas as pd
a=[5,5,5,6,6,6,7,5,4,1,8,9]
b=[50,40,45,87,88,54,12,75,55,87,46,98]
df = pd.DataFrame(list(zip(a,b)), columns =['Patch', 'Reward'])
df=df.drop_duplicates(subset='Patch', keep="last")
df = df.set_index('Patch')
print (df)
when I run this I get:
Reward
Patch
6 54
7 12
5 75
4 55
1 87
8 46
9 98
however, what I want is:
Patch Reward
5 45
6 54
7 12
5 75
4 55
1 87
8 46
9 98
PS: I don't want the duplicate elements repeating after another element or later in the series to be removed, but remove only consecutive duplicates while keeping the last to appear in the consecutive appearance. I also don't want it to be sorted, they should appear in the same sequence as in the list.
You can create a new column assigning an id to each group of consecutive elements and then doing the groupby
operation followed by last
aggregation.
a=[5,5,5,6,6,6,7,5,4,1,8,9]
b=[50,40,45,87,88,54,12,75,55,87,46,98]
df = pd.DataFrame(list(zip(a,b)), columns =['Patch', 'Reward'])
df["group_id"]=(df.Patch != df.Patch.shift()).cumsum()
df = df.groupby("group_id").last()
Output
Patch Reward
5 45
6 54
7 12
5 75
4 55
1 87
8 46
9 98