How to do the following in pandas-
df = pd.DataFrame({
'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G'],
'ColD': [0,0,0,0,0, 22222,0,0,0,0,0,48,0,0,52,0,0,0]
})
Disregard all rows before 1st and after last occurrence of X in ColC
search for 1st occurrence of X in ColC, select one row below, take value (say, Y) from ColA from such selected row. (in df it is 11111)
search for 2nd occurrence of X in ColC, take value (say, Z) from ColB from such row. (in df it is 33333)
write result value (Z-Y) in ColD next to 2nd occurrence of X. (should be 33333-11111 = 22222)
Repeat for pairs of 2nd and 3rd occurrence of X, 3rd and 4th, 4th and 5th and so on till the end of df.
Expected result in ColD.
Using some shift
-fu and boolean indexing:
m = df['ColC'].eq('X')
df['ColD'] = df.loc[m, 'ColB'] - df['ColA'].shift(-1)[m].shift()
Or, if you want 0s:
m = df['ColC'].eq('X')
df['ColD'] = (df['ColB'].sub(df['ColA'].shift(-1)[m].shift())
.fillna(0, downcast='infer')
)
Output:
ColA ColB ColC ColD
0 1 11 A 0
1 2 5 B 0
2 3 22 X 0
3 11111 66 C 0
4 5 4 D 0
5 6 33333 X 22222
6 7 45 E 0
7 8 91 F 0
8 9 78 G 0
9 1 10 A 0
10 2 17 B 0
11 3 55 X 48
12 4 73 C 0
13 5 85 D 0
14 6 56 X 52
15 7 99 E 0
16 8 4 F 0
17 9 74 G 0
Intermediate:
# get the value of the row following the previous X
df['ColA'].shift(-1)[m].shift()
2 NaN
5 11111.0
11 7.0
14 4.0
Name: ColA, dtype: float64