Search code examples
pandasfrequencyrolling-computation

Frequency over next x rows in dataframe


I need to figure out how to count the frequency a value below that from one column occurs over the next x rows in another column.

Pasting some data should make this clearer :

              DATE    Last Price    B50
  ID            
SPX Index   2020-04-03  2488.65 1244.325

SPX Index   2020-04-06  2663.68 1331.840

SPX Index   2020-04-07  2659.41 1329.705

SPX Index   2020-04-08  2749.98 1374.990

SPX Index   2020-04-09  2789.82 1394.910

SPX Index   2020-04-13  2761.63 1380.815

SPX Index   2020-04-14  2846.06 1423.030

SPX Index   2020-04-15  2783.36 1391.680

SPX Index   2020-04-16  2799.55 1399.775

SPX Index   2020-04-17  2874.56 1437.280

SPX Index   2020-04-20  2823.16 1411.580

SPX Index   2020-04-21  2736.56 1368.280

SPX Index   2020-04-22  2799.31 1399.655

SPX Index   2020-04-23  2797.80 1398.900

SPX Index   2020-04-24  2836.74 1418.370

SPX Index   2020-04-27  2878.48 1439.240

SPX Index   2020-04-28  2863.39 1431.695

SPX Index   2020-04-29  2939.51 1469.755

SPX Index   2020-04-30  2912.43 1456.215

SPX Index   2020-05-01  2830.71 1415.355

SPX Index   2020-05-04  2842.74 1421.370

SPX Index   2020-05-05  2868.44 1434.220

SPX Index   2020-05-06  2848.42 1424.210

SPX Index   2020-05-07  2881.19 1440.595

SPX Index   2020-05-08  2929.80 1464.900

SPX Index   2020-05-11  2930.32 1465.160

SPX Index   2020-05-12  2870.12 1435.060

SPX Index   2020-05-13  2820.00 1410.000

SPX Index   2020-05-14  2852.50 1426.250

SPX Index   2020-05-15  2863.70 1431.850

So I'd like to be able to input a day count (or row count if you'd rather). Lets use 20 in this example. I need to be able to then calculate how many times over the next 20 rows does the value in 'Last Price' fall below the first value in column 'B50' (1244.325) . I'd then record this "count" as a value in a new column 'Count B50' and that would be on date line 2020-04-03.

Then on the next row (date line 2020-06-04) I'd count the number of times over the next 20 rows the 'Last Price' value is below the second value in 'B50' (1331.84) and record this in column 'Count B50' on date line 2020-06-04...and so on and so on.


Solution

  • You can make use of the DataFrame method rolling. Since you want a window that "looks forward" for the next N rows (plus the actual one), you can do:

    indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=N+1)
    
    df['Count_B50'] = (df['Last_Price'].rolling(indexer, min_periods=1)
                                       .apply(lambda window: (window <
                                                              df['B50'].iloc[window.index.min()]
                                                              ).sum()))
    

    If you do not want to include the actual row in the computation, you can just add the keyword argument closed="right" inside the call of rolling.

    Either way, for your test data the output is as follows (for the first five rows):

           DATE  Last_Price       B50  Count_B50
     2020-04-03     2488.65  1244.325        0.0
     2020-04-06     2663.68  1331.840        0.0
     2020-04-07     2659.41  1329.705        0.0
     2020-04-08     2749.98  1374.990        0.0
     2020-04-09     2789.82  1394.910        0.0
    

    Let us know if you have any further issues!