Flagging a row according to the various amount of neighbour values in a pandas dataframe

I got a transactional operation that produces a feed like below:


df = pd.DataFrame({'action':['transacted','transacted','transacted','transacted','undo','transacted','transacted','transacted','transacted','transacted','undo','undo','undo','transacted'],
                  'transaction_count':10,20,35,60,60,60,80,90,100,10,10,100,90,90]})

	action	transaction_count
0	transacted	10
1	transacted	20
2	transacted	35
3	transacted	60
4	undo	60
5	transacted	60
6	transacted	80
7	transacted	90
8	transacted	100
9	transacted	10
10	undo	10
11	undo	100
12	undo	90
13	transacted	90

The counts are in a pattern but not in a linear way. (10-20-35-60-80-90-100-10-20...)

undo states which transaction count is cancelled.

There can be multiple undo's for multiple cancellations.

# This is an initial apply, to set it up
df['is_undone']=df.apply(lambda x: 1 if x['action']=='undo' else 0, axis=1).shift(-1)
df=df.fillna(0)  # For shift

df=df.loc[df['is_undone']==0]
df=df.fillna(0)
df=df.loc[df['action']!='undo']
df.reset_index(drop=True,inplace=True)

Unfortunately, it only works for single undo but not for multiple in a row. Apply does not let accessing neighbour row values and I can't think of any else solution. It should also need to calculate 300k rows, so, performance is also an issue.

Expected result is:

	action	transaction_count
0	transacted	10
1	transacted	20
2	transacted	35
3	transacted	60
4	transacted	80
5	transacted	90

Thanks in advance!

Solution

A slight modification of this excellent answer might give you what you want:

Solution

def undo(frame):
    d = {"transacted": 0, "undo": 1}
    condition = frame["action"].map(d).rolling(2,2).apply(lambda x: x.to_list()==[0,1]).mask(lambda x: x==0).bfill(limit=1).fillna(0)
    return frame[condition==0].reset_index(drop=True)
               
result = df.groupby("transaction_count").apply(undo).reset_index(drop=True)
>>> result
       action  transaction_count
0  transacted                 10
1  transacted                 20
2  transacted                 35
3  transacted                 60
4  transacted                 80
5  transacted                 90

Explanation

groupby is being used to deal with each transaction_count separately. As, an example, consider the case where transaction_count is 10.

frame = df[df["transaction_count"]==10]
>>> frame
        action  transaction_count
0   transacted                 10
9   transacted                 10
10        undo                 10

In the undo function, we first map the action column to a number:

>>> frame["action"].map(d)
0     0
9     0
10    1

Recognize that we want to remove rows where 0 (transacted) is immediately followed by 1 (undo). In the above, this corresponds to rows with indices 9 and 10.

To do this, we work with 2 rows of the above frame at a time using pd.rolling and lambda:

>>> frame["action"].map(d).rolling(2,2).apply(lambda x: x.to_list()==[0,1])
0     NaN
9     0.0
10    1.0

Now, mask 0s to np.nan, bfill (back-fill) exactly once, and fillna with 0.

>>> frame["action"].map(d).rolling(2,2).apply(lambda x: x.to_list()==[0,1]).mask(lambda x: x==0).bfill(limit=1).fillna(0)
0     0.0
9     1.0
10    1.0
Name: action, dtype: float64

From the above, we need all rows that are not equal to 1. This is what is returned in the undo function.