I have a DataFrame where I want to drop a row (or rows) where more than one consecutive row has the same value for a column based on the value of a different column. In this instance, I want to keep the row with the highest value in B if high or the lowest if low. In essence, I'm trying to only have highs followed by lows and lows by highs.
df = pd.DataFrame({'A': ['low', 'high', 'high', 'low', 'low','low'],
'B': [10, 70, 90, 40, 50,60]})
Output:
A B
0 low 10
1 high 70
2 high 90
3 low 40
4 low 50
5 low 60
Desired:
A B
0 low 10
2 high 90
3 low 40
Trying to get my head around how to implement the logic and have run into a brick wall.
Here is another way:
d = {'low':-1}
(df.assign(B = df['B'].mul(df['A'].map(d),fill_value=1))
.groupby(['A',pd.Series(pd.factorize(df['A'])[0]).diff().ne(0).cumsum()]).max()
.abs()
.sort_index(level=1)
.droplevel(1)
.reset_index())
or
df.loc[df['A'].map({'low':-1}).mul(df['B'],fill_value=1).groupby(df['A'].ne(df['A'].shift()).cumsum()).idxmax()]
Output:
A B
0 low 10.0
1 high 90.0
2 low 40.0