I have the following data frame:
Column A Column B
MD223 GATE IN
MD223 GATE OUT
MD223 LOADED
MD223 DEPARTURE
SC511 GATE IN
SC511 LOADED
SC511 SHIPPED
KR977 DISPATCHED
KR977 LOADED
KR977 SHIPPED
What I am trying to do is for each group of the same values in column A to find the first row with the value 'LOADED' in column B and then return rows before the value 'LOADED'. So expected outcome would be:
Column A Column B
MD223 GATE IN
MD223 GATE OUT
MD223 LOADED
SC511 GATE IN
SC511 LOADED
KR977 DISPATCHED
KR977 LOADED
I would appreciate your help.
Update
The initial data frame has duplicates in column B but the outcome must be the same.
Column A Column B
MD223 GATE IN
MD223 GATE OUT
MD223 LOADED
MD223 DEPARTURE
MD223 LOADED
SC511 GATE IN
SC511 LOADED
SC511 SHIPPED
SC511 LOADED
KR977 DISPATCHED
KR977 LOADED
KR977 SHIPPED
Expected outcome:
Column A Column B
MD223 GATE IN
MD223 GATE OUT
MD223 LOADED
SC511 GATE IN
SC511 LOADED
KR977 DISPATCHED
KR977 LOADED
Lets do cummax
per group in Column A to flag the all the rows before the first LOADED
the use loc
to select all the flagged rows
mask = df['Column B'] == 'LOADED'
df.loc[mask[::-1].groupby(df['Column A']).cummax()]
Update: If you have duplicates in Column B
mask = df['Column B'].eq('LOADED') & ~df.duplicated(subset=['Column A', 'Column B'])
df.loc[mask[::-1].groupby(df['Column A']).cummax()]
Result
Column A Column B
0 MD223 GATE IN
1 MD223 GATE OUT
2 MD223 LOADED
4 SC511 GATE IN
5 SC511 LOADED
7 KR977 DISPATCHED
8 KR977 LOADED