Python Dataframe calculations based on another dataframe

I've 2 data frames.

I'd like to populate DF1 columns with data from DF2 using below calculations. Prev_close column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get previous row "CLOSE" column value and populate in DF1 "prev_close" column.

Below rest of columns are in % format. D1L column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get "LOW" column value and populate in DF1 "D1L" column as a formula ("low" - "prev_close") / (prev_close)

D1H column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get "HIGH" column value and populate in DF1 "D1H" column as a formula ("high" - "prev_close" ) / (prev_close)

D2L column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get 2nd row after lookup row (current row + 1) "LOW" column value and populate in DF1 "D2L" column as a formula ("low" - "prev_close") / (prev_close)

D2H column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get 2nd row after lookup row (current row + 1) "HIGH" column value and populate in DF1 "D2H" column as a formula ("high" - "prev_close" ) / (prev_close)

D3L column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get 3rd row after lookup row (current row + 2) "LOW" column value and populate in DF1 "D3L" column as a formula ("low" - "prev_close") / (prev_close)

D3H column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get 3rd row after lookup row (current row + 2) "HIGH" column value and populate in DF1 "D3H" column as a formula ("high" - "prev_close" ) / (prev_close)

in the end, i'm expecting output as follows and appreciate any help or code you can provide. Expected output of Dataframe1

Solution

I've tried to break it up step-by-step so its easier to follow.

df1 and df2 are what you provided.

The groupby function makes sure we don't mix up each symbol.
The shift function fetches the previous or next row.
The apply section is for converting the result into a blank if invalid, or a percentage to 3 decimal places if valid. {:.3%} You can change this number to change the decimal places required.
The merge function is the final step to only get the rows and calculations that you need.

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Symbol': ['SPX', 'SPX', 'SPX', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL'],
                   'open': [2665.69, 2689.6, 2692.165, 123.5402, 128.1923, 124.5984, 121.5435, 120.7749, 120.7249],
                   'high': [2712.81, 2722.22, 2712.31, 127.7131, 128.5017, 125.4968, 123.3904, 121.7332, 120.7948],
                   'low': [2656.505, 2667.57, 2668.65, 122.5818, 124.798, 121.6334, 118.4189, 117.3706, 116.0129],
                   'close': [2697.82, 2683.36, 2695.03, 127.5733, 124.9078, 121.853, 119.9263, 121.2141, 116.1627],
                   'volume': [1483784, 1231996, 1198337, 1.16E+08, 1.02E+08, 1.13E+08, 1.78E+08, 1.54E+08, 1.54E+08],
                   'date': [20220301, 20220302, 20220303, 20220301, 20220302, 20220303, 20220304, 20220305, 20220306]})

df2 = pd.DataFrame({'Symbol': ['SPX', 'AAPL'], 'date': [20220302, 20220304]})

dfcal = df1
dfcal['Prev_Close'] = df1.groupby(['Symbol'])['close'].shift(1)
dfcal['D1L'] = ((dfcal.groupby(['Symbol'])['low'].shift(0) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D1H'] = ((dfcal.groupby(['Symbol'])['high'].shift(0) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D2L'] = ((dfcal.groupby(['Symbol'])['low'].shift(-1) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D2H'] = ((dfcal.groupby(['Symbol'])['high'].shift(-1) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D3L'] = ((dfcal.groupby(['Symbol'])['low'].shift(-2) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D3H'] = ((dfcal.groupby(['Symbol'])['high'].shift(-2) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D4L'] = ((dfcal.groupby(['Symbol'])['low'].shift(-3) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D4H'] = ((dfcal.groupby(['Symbol'])['high'].shift(-3) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))


dffinal = pd.merge(df2, dfcal[['Symbol', 'date', 'Prev_Close', 'D1L', 'D1H', 'D2L', 'D2H', 'D3L', 'D3H', 'D4L', 'D4H']], on = ['Symbol', 'date'], how = 'left')
dffinal