I need to figure out how to count the frequency a value below that from one column occurs over the next x rows in another column.
Pasting some data should make this clearer :
DATE Last Price B50
ID
SPX Index 2020-04-03 2488.65 1244.325
SPX Index 2020-04-06 2663.68 1331.840
SPX Index 2020-04-07 2659.41 1329.705
SPX Index 2020-04-08 2749.98 1374.990
SPX Index 2020-04-09 2789.82 1394.910
SPX Index 2020-04-13 2761.63 1380.815
SPX Index 2020-04-14 2846.06 1423.030
SPX Index 2020-04-15 2783.36 1391.680
SPX Index 2020-04-16 2799.55 1399.775
SPX Index 2020-04-17 2874.56 1437.280
SPX Index 2020-04-20 2823.16 1411.580
SPX Index 2020-04-21 2736.56 1368.280
SPX Index 2020-04-22 2799.31 1399.655
SPX Index 2020-04-23 2797.80 1398.900
SPX Index 2020-04-24 2836.74 1418.370
SPX Index 2020-04-27 2878.48 1439.240
SPX Index 2020-04-28 2863.39 1431.695
SPX Index 2020-04-29 2939.51 1469.755
SPX Index 2020-04-30 2912.43 1456.215
SPX Index 2020-05-01 2830.71 1415.355
SPX Index 2020-05-04 2842.74 1421.370
SPX Index 2020-05-05 2868.44 1434.220
SPX Index 2020-05-06 2848.42 1424.210
SPX Index 2020-05-07 2881.19 1440.595
SPX Index 2020-05-08 2929.80 1464.900
SPX Index 2020-05-11 2930.32 1465.160
SPX Index 2020-05-12 2870.12 1435.060
SPX Index 2020-05-13 2820.00 1410.000
SPX Index 2020-05-14 2852.50 1426.250
SPX Index 2020-05-15 2863.70 1431.850
So I'd like to be able to input a day count (or row count if you'd rather). Lets use 20 in this example. I need to be able to then calculate how many times over the next 20 rows does the value in 'Last Price' fall below the first value in column 'B50' (1244.325) . I'd then record this "count" as a value in a new column 'Count B50' and that would be on date line 2020-04-03.
Then on the next row (date line 2020-06-04) I'd count the number of times over the next 20 rows the 'Last Price' value is below the second value in 'B50' (1331.84) and record this in column 'Count B50' on date line 2020-06-04...and so on and so on.
You can make use of the DataFrame method rolling
. Since you want a window that "looks forward" for the next N
rows (plus the actual one), you can do:
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=N+1)
df['Count_B50'] = (df['Last_Price'].rolling(indexer, min_periods=1)
.apply(lambda window: (window <
df['B50'].iloc[window.index.min()]
).sum()))
If you do not want to include the actual row in the computation, you can just add the keyword argument closed="right"
inside the call of rolling
.
Either way, for your test data the output is as follows (for the first five rows):
DATE Last_Price B50 Count_B50
2020-04-03 2488.65 1244.325 0.0
2020-04-06 2663.68 1331.840 0.0
2020-04-07 2659.41 1329.705 0.0
2020-04-08 2749.98 1374.990 0.0
2020-04-09 2789.82 1394.910 0.0
Let us know if you have any further issues!