I have the following dataframe:
df = pd.DataFrame({'id':[1,1,1,2,2,3,3,4,5,6,6,6,6,6,8,8,9,11,12,12],'letter':['A','A','Q','Q','Q','F','F','G','D','G','I','I','K','Q','E','S','S','I','I','F']})
My objective is to add another column tx that shows the followings: if it finds Q and there after an I - mark it as 1st transaction. Both Q and I must exists and must have the same comes as last_Q --> first_I.
so the end result should look like this:
I would use boolean arithmetic:
# map True/False to Q/I
m1 = df['letter'].map({'Q': True, 'I': False})
# ffill the values
m2 = m1.ffill().fillna(False)
# only keep last Qs
m3 = m1.shift(-1).ne(True)
m4 = m2&m3
df['tx'] = (m1&m3).cumsum().where(m4|m4.shift(), 0)
Output:
id letter tx
0 1 A 0
1 1 A 0
2 1 Q 0
3 2 Q 0
4 2 Q 1
5 3 F 1
6 3 F 1
7 4 G 1
8 5 D 1
9 6 G 1
10 6 I 1
11 6 I 0
12 6 K 0
13 6 Q 2
14 8 E 2
15 8 S 2
16 9 S 2
17 11 I 2
18 12 I 0
19 12 F 0
Intermediates:
id letter tx m1 m2 m3 m4
0 1 A 0 NaN False True False
1 1 A 0 NaN False False False
2 1 Q 0 True True False False
3 2 Q 0 True True False False
4 2 Q 1 True True True True
5 3 F 1 NaN True True True
6 3 F 1 NaN True True True
7 4 G 1 NaN True True True
8 5 D 1 NaN True True True
9 6 G 1 NaN True True True
10 6 I 1 False False True False
11 6 I 0 False False True False
12 6 K 0 NaN False False False
13 6 Q 2 True True True True
14 8 E 2 NaN True True True
15 8 S 2 NaN True True True
16 9 S 2 NaN True True True
17 11 I 2 False False True False
18 12 I 0 False False True False
19 12 F 0 NaN False True False