Search code examples
pythondataframelambdaapplyrolling-computation

Python dataframe - rolling max and min search on different colum within lines until max


I can't find a method to solve this logical challenge with Python and Dataframes, searching for a lowest value in a nr of rows, where the nr of rows is dependent on a previous calc.

Note: I am pretty much starting level with python and dataframes - so apologies if I don't use the best code or formatting. Feel free to correct or suggest improvements so that I can learn.

df.columns=['Timestamp', 'Open', 'High', 'Low', 'Close', 'Volume']
df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='ms')
df.set_index('Timestamp', inplace=True)
maxframes = 5
df['max5'] = df['High'].shift(-maxframes).rolling(maxframes).max()

This results in:

Timestamp Open High Low Close Volume max5
2023-02-01 00:00:00 23125.13 23140.47 23124.41 23133.74 199.09155 23168.90
2023-02-01 00:01:00 23133.74 23137.00 23123.15 23123.24 153.92866 23168.90
2023-02-01 00:02:00 23123.16 23145.99 23123.01 23142.98 204.22769 23168.90
2023-02-01 00:03:00 23144.00 23168.90 23141.75 23157.56 385.73800 23159.26
2023-02-01 00:04:00 23158.68 23159.26 23131.97 23133.11 209.27799 23136.90
2023-02-01 00:05:00 23133.11 23136.90 23086.73 23087.06 514.67716 23112.84
2023-02-01 00:06:00 23087.06 23112.84 23084.00 23101.51 425.65720
2023-02-01 00:07:00 23100.03 23105.14 23086.08 23086.35 272.72273
2023-02-01 00:08:00 23086.35 23104.53 23084.35 23090.76 218.65485
2023-02-01 00:09:00 23089.85 23091.31 23059.90 23067.90 558.62597

The challenge: max5 results in the highest value in column High of the next 4 rows. for each row I need to find the lowest value in colum Low in next rows (=from this row until that row where max5 is found).

example For row 1 (= 2023-02-01 00:00:00) max5 = 23168.90, found on 2023-02-01 00:03:00 the lowest low from 2023-02-01 00:00:00 to 2023-02-01 00:03:00 is 23123.01

I guess I need to create a lambda or a separate function and use it in an apply.

A lambda seems more appropriate for shorter logic, this is a bit too complex

With a function I enountered the next issues:

  1. I can't find a dataframe function that returns the rowid or rownr of the max5 value found in the High column
  2. I can't find how I can calculate the lowest value in the low column in that column, in the next rows until the row where max5 is found

Solution

  • I hope I've understood your question right:

    def my_func(x):
        if x.name > len(df) - 5:
            return np.nan
        idx_max = df.loc[x.name+1:x.name+5, 'High'].idxmax()
        min_val = df.loc[x.name+1:idx_max, 'Low'].min()
        return df['High'].iat[idx_max], min_val
    
    df = df.reset_index()
    df[['max5', 'min_val']] = df.apply(my_func, axis=1, result_type='expand')
    print(df)
    

    Prints:

                Timestamp      Open      High       Low     Close     Volume      max5   min_val
    0 2023-02-01 00:00:00  23125.13  23140.47  23124.41  23133.74  199.09155  23168.90  23123.01
    1 2023-02-01 00:01:00  23133.74  23137.00  23123.15  23123.24  153.92866  23168.90  23123.01
    2 2023-02-01 00:02:00  23123.16  23145.99  23123.01  23142.98  204.22769  23168.90  23141.75
    3 2023-02-01 00:03:00  23144.00  23168.90  23141.75  23157.56  385.73800  23159.26  23131.97
    4 2023-02-01 00:04:00  23158.68  23159.26  23131.97  23133.11  209.27799  23136.90  23086.73
    5 2023-02-01 00:05:00  23133.11  23136.90  23086.73  23087.06  514.67716  23112.84  23084.00
    6 2023-02-01 00:06:00  23087.06  23112.84  23084.00  23101.51  425.65720       NaN       NaN
    7 2023-02-01 00:07:00  23100.03  23105.14  23086.08  23086.35  272.72273       NaN       NaN
    8 2023-02-01 00:08:00  23086.35  23104.53  23084.35  23090.76  218.65485       NaN       NaN
    9 2023-02-01 00:09:00  23089.85  23091.31  23059.90  23067.90  558.62597       NaN       NaN