pandas select rows and cell values based on column and other conditions

How to do the following in pandas-

df = pd.DataFrame({
    'ColA': [1, 2, 3, 11111, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'ColB': [11, 5, 22, 66, 4, 33333, 45, 91, 78, 10, 17, 55, 73, 85, 56, 99, 4, 74],
    'ColC': ['A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G', 'A', 'B', 'X', 'C', 'D', 'X', 'E', 'F', 'G'],
    'ColD': [0,0,0,0,0, 22222,0,0,0,0,0,48,0,0,52,0,0,0]
})

Disregard all rows before 1st and after last occurrence of X in ColC

search for 1st occurrence of X in ColC, select one row below, take value (say, Y) from ColA from such selected row. (in df it is 11111)

search for 2nd occurrence of X in ColC, take value (say, Z) from ColB from such row. (in df it is 33333)

write result value (Z-Y) in ColD next to 2nd occurrence of X. (should be 33333-11111 = 22222)

Repeat for pairs of 2nd and 3rd occurrence of X, 3rd and 4th, 4th and 5th and so on till the end of df.

Expected result in ColD.

Solution

Using some shift-fu and boolean indexing:

m = df['ColC'].eq('X')

df['ColD'] = df.loc[m, 'ColB'] - df['ColA'].shift(-1)[m].shift()

Or, if you want 0s:

m = df['ColC'].eq('X')

df['ColD'] = (df['ColB'].sub(df['ColA'].shift(-1)[m].shift())
                .fillna(0, downcast='infer')
             )

Output:

     ColA   ColB ColC   ColD
0       1     11    A      0
1       2      5    B      0
2       3     22    X      0
3   11111     66    C      0
4       5      4    D      0
5       6  33333    X  22222
6       7     45    E      0
7       8     91    F      0
8       9     78    G      0
9       1     10    A      0
10      2     17    B      0
11      3     55    X     48
12      4     73    C      0
13      5     85    D      0
14      6     56    X     52
15      7     99    E      0
16      8      4    F      0
17      9     74    G      0

Intermediate:

# get the value of the row following the previous X
df['ColA'].shift(-1)[m].shift()

2         NaN
5     11111.0
11        7.0
14        4.0
Name: ColA, dtype: float64