I can't find a method to solve this logical challenge with Python and Dataframes, searching for a lowest value in a nr of rows, where the nr of rows is dependent on a previous calc.
Note: I am pretty much starting level with python and dataframes - so apologies if I don't use the best code or formatting. Feel free to correct or suggest improvements so that I can learn.
df.columns=['Timestamp', 'Open', 'High', 'Low', 'Close', 'Volume']
df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='ms')
df.set_index('Timestamp', inplace=True)
maxframes = 5
df['max5'] = df['High'].shift(-maxframes).rolling(maxframes).max()
This results in:
Timestamp | Open | High | Low | Close | Volume | max5 |
---|---|---|---|---|---|---|
2023-02-01 00:00:00 | 23125.13 | 23140.47 | 23124.41 | 23133.74 | 199.09155 | 23168.90 |
2023-02-01 00:01:00 | 23133.74 | 23137.00 | 23123.15 | 23123.24 | 153.92866 | 23168.90 |
2023-02-01 00:02:00 | 23123.16 | 23145.99 | 23123.01 | 23142.98 | 204.22769 | 23168.90 |
2023-02-01 00:03:00 | 23144.00 | 23168.90 | 23141.75 | 23157.56 | 385.73800 | 23159.26 |
2023-02-01 00:04:00 | 23158.68 | 23159.26 | 23131.97 | 23133.11 | 209.27799 | 23136.90 |
2023-02-01 00:05:00 | 23133.11 | 23136.90 | 23086.73 | 23087.06 | 514.67716 | 23112.84 |
2023-02-01 00:06:00 | 23087.06 | 23112.84 | 23084.00 | 23101.51 | 425.65720 | |
2023-02-01 00:07:00 | 23100.03 | 23105.14 | 23086.08 | 23086.35 | 272.72273 | |
2023-02-01 00:08:00 | 23086.35 | 23104.53 | 23084.35 | 23090.76 | 218.65485 | |
2023-02-01 00:09:00 | 23089.85 | 23091.31 | 23059.90 | 23067.90 | 558.62597 |
The challenge: max5 results in the highest value in column High of the next 4 rows. for each row I need to find the lowest value in colum Low in next rows (=from this row until that row where max5 is found).
example For row 1 (= 2023-02-01 00:00:00) max5 = 23168.90, found on 2023-02-01 00:03:00 the lowest low from 2023-02-01 00:00:00 to 2023-02-01 00:03:00 is 23123.01
I guess I need to create a lambda or a separate function and use it in an apply.
A lambda seems more appropriate for shorter logic, this is a bit too complex
With a function I enountered the next issues:
I hope I've understood your question right:
def my_func(x):
if x.name > len(df) - 5:
return np.nan
idx_max = df.loc[x.name+1:x.name+5, 'High'].idxmax()
min_val = df.loc[x.name+1:idx_max, 'Low'].min()
return df['High'].iat[idx_max], min_val
df = df.reset_index()
df[['max5', 'min_val']] = df.apply(my_func, axis=1, result_type='expand')
print(df)
Prints:
Timestamp Open High Low Close Volume max5 min_val
0 2023-02-01 00:00:00 23125.13 23140.47 23124.41 23133.74 199.09155 23168.90 23123.01
1 2023-02-01 00:01:00 23133.74 23137.00 23123.15 23123.24 153.92866 23168.90 23123.01
2 2023-02-01 00:02:00 23123.16 23145.99 23123.01 23142.98 204.22769 23168.90 23141.75
3 2023-02-01 00:03:00 23144.00 23168.90 23141.75 23157.56 385.73800 23159.26 23131.97
4 2023-02-01 00:04:00 23158.68 23159.26 23131.97 23133.11 209.27799 23136.90 23086.73
5 2023-02-01 00:05:00 23133.11 23136.90 23086.73 23087.06 514.67716 23112.84 23084.00
6 2023-02-01 00:06:00 23087.06 23112.84 23084.00 23101.51 425.65720 NaN NaN
7 2023-02-01 00:07:00 23100.03 23105.14 23086.08 23086.35 272.72273 NaN NaN
8 2023-02-01 00:08:00 23086.35 23104.53 23084.35 23090.76 218.65485 NaN NaN
9 2023-02-01 00:09:00 23089.85 23091.31 23059.90 23067.90 558.62597 NaN NaN