Count number of consecutive rows that are greater than current row value but less than the value from other column

Say I have the following sample dataframe (there are about 25k rows in the real dataframe)

df = pd.DataFrame({'A' : [0,3,2,9,1,0,4,7,3,2], 'B': [9,8,3,5,5,5,5,8,0,4]})
df
   A  B
0  0  9
1  3  8
2  2  3
3  9  5
4  1  5
5  0  5
6  4  5
7  7  8
8  3  0
9  2  4

For the column A I need to know how many next and previous rows are greater than current row value but less than value in column B.

So my expected output is :

A   B next count  previous count
0   9     2          0
3   8     0          0
2   3     0          1
9   5     0          0
1   5     0          0
0   5     2          1
4   5     1          0
7   8     0          0
3   0     0          2
2   4     0          0

Explanation :

First row is calculated as : since 3 and 2 are greater than 0 but less than corresponding B value 8 and 3
Second row is calculated as : since next value 2 is not greater than 3
Third row is calculated as : since 9 is greater than 2 but not greater than its corresponding B value

Similarly, previous count is calculated

Note : I know how to solve this problem by looping using list comprehension or using the pandas apply method but still I won't mind a clear and concise apply approach. I was looking for a more pandaic approach.

My Solution

Here is the apply solution, which I think is inefficient. Also, as people said that there might be no vector solution for the question. So as mentioned, a more efficient apply solution will be accepted for this question.

This is what I have tried.

This function gets the number of previous/next rows that satisfy the condition.

def get_prev_next_count(row):
    next_nrow = df.loc[row['index']+1:,['A', 'B']]
    prev_nrow = df.loc[:row['index']-1,['A', 'B']][::-1]
    if (next_nrow.size == 0):
        return 0, ((prev_nrow.A > row.A) & (prev_nrow.A < prev_nrow.B)).argmin()
    if (prev_nrow.size == 0):
        return ((next_nrow.A > row.A) & (next_nrow.A < next_nrow.B)).argmin(), 0
    return (((next_nrow.A > row.A) & (next_nrow.A < next_nrow.B)).argmin(), ((prev_nrow.A > row.A) & (prev_nrow.A < prev_nrow.B)).argmin())

Generating output :

df[['next count', 'previous count']] = df.reset_index().apply(get_prev_next_count, axis=1, result_type="expand")

Output :

This gives us the expected output

df
   A  B  next count  previous count
0  0  9           2               0
1  3  8           0               0
2  2  3           0               1
3  9  5           0               0
4  1  5           0               0
5  0  5           2               1
6  4  5           1               0
7  7  8           0               0
8  3  0           0               2
9  2  4           0               0

Solution

I made some optimizations:

You don't need to reset_index() you can access the index with .name
If you only pass df[['A']] instead of the whole frame, that may help.
prev_nrow.empty is the same as (prev_nrow.size == 0)
Applied different logic to get the desired value via first_false, this speeds things up significantly.

def first_false(val1, val2, A):
    i = 0
    for x, y in zip(val1, val2):
        if A < x < y:
            i += 1
        else:
            break
    return i

def get_prev_next_count(row):
    A = row['A']
    next_nrow = df.loc[row.name+1:,['A', 'B']]
    prev_nrow = df2.loc[row.name-1:,['A', 'B']]
    if next_nrow.empty:
        return 0, first_false(prev_nrow.A, prev_nrow.B, A)
    if prev_nrow.empty:
        return first_false(next_nrow.A, next_nrow.B, A), 0
    return (first_false(next_nrow.A, next_nrow.B, A),
            first_false(prev_nrow.A, prev_nrow.B, A))

df2 = df[::-1].copy() # Shave a tiny bit of time by only reversing it once~

df[['next count', 'previous count']] = df[['A']].apply(get_prev_next_count, axis=1, result_type='expand')
print(df)

Output:

   A  B  next count  previous count
0  0  9           2               0
1  3  8           0               0
2  2  3           0               1
3  9  5           0               0
4  1  5           0               0
5  0  5           2               1
6  4  5           1               0
7  7  8           0               0
8  3  0           0               2
9  2  4           0               0

Timing

Expanding the data:

df = pd.concat([df]*(10000//4), ignore_index=True)
# df.shape == (25000, 2)

Original Method:

Gave up at 15 minutes.

New Method:

1m 20sec

Throw pandarallel at it:

from pandarallel import pandarallel
pandarallel.initialize()

df[['A']].parallel_apply(get_prev_next_count, axis=1, result_type='expand')

26sec