I've 2 data frames.
I'd like to populate DF1 columns with data from DF2 using below calculations. Prev_close column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get previous row "CLOSE" column value and populate in DF1 "prev_close" column.
Below rest of columns are in % format. D1L column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get "LOW" column value and populate in DF1 "D1L" column as a formula ("low" - "prev_close") / (prev_close)
D1H column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get "HIGH" column value and populate in DF1 "D1H" column as a formula ("high" - "prev_close" ) / (prev_close)
D2L column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get 2nd row after lookup row (current row + 1) "LOW" column value and populate in DF1 "D2L" column as a formula ("low" - "prev_close") / (prev_close)
D2H column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get 2nd row after lookup row (current row + 1) "HIGH" column value and populate in DF1 "D2H" column as a formula ("high" - "prev_close" ) / (prev_close)
D3L column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get 3rd row after lookup row (current row + 2) "LOW" column value and populate in DF1 "D3L" column as a formula ("low" - "prev_close") / (prev_close)
D3H column : for each (symbol, date) as key from df1, lookup (symbol,date) in df2 and get 3rd row after lookup row (current row + 2) "HIGH" column value and populate in DF1 "D3H" column as a formula ("high" - "prev_close" ) / (prev_close)
in the end, i'm expecting output as follows and appreciate any help or code you can provide. Expected output of Dataframe1
I've tried to break it up step-by-step so its easier to follow.
df1
and df2
are what you provided.
groupby
function makes sure we don't mix up each symbol.shift
function fetches the previous or next row.apply
section is for converting the result into a blank if invalid, or a percentage to 3 decimal places if valid. {:.3%}
You can change this number to change the decimal places required.merge
function is the final step to only get the rows and calculations that you need.import pandas as pd
import numpy as np
df1 = pd.DataFrame({'Symbol': ['SPX', 'SPX', 'SPX', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL'],
'open': [2665.69, 2689.6, 2692.165, 123.5402, 128.1923, 124.5984, 121.5435, 120.7749, 120.7249],
'high': [2712.81, 2722.22, 2712.31, 127.7131, 128.5017, 125.4968, 123.3904, 121.7332, 120.7948],
'low': [2656.505, 2667.57, 2668.65, 122.5818, 124.798, 121.6334, 118.4189, 117.3706, 116.0129],
'close': [2697.82, 2683.36, 2695.03, 127.5733, 124.9078, 121.853, 119.9263, 121.2141, 116.1627],
'volume': [1483784, 1231996, 1198337, 1.16E+08, 1.02E+08, 1.13E+08, 1.78E+08, 1.54E+08, 1.54E+08],
'date': [20220301, 20220302, 20220303, 20220301, 20220302, 20220303, 20220304, 20220305, 20220306]})
df2 = pd.DataFrame({'Symbol': ['SPX', 'AAPL'], 'date': [20220302, 20220304]})
dfcal = df1
dfcal['Prev_Close'] = df1.groupby(['Symbol'])['close'].shift(1)
dfcal['D1L'] = ((dfcal.groupby(['Symbol'])['low'].shift(0) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D1H'] = ((dfcal.groupby(['Symbol'])['high'].shift(0) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D2L'] = ((dfcal.groupby(['Symbol'])['low'].shift(-1) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D2H'] = ((dfcal.groupby(['Symbol'])['high'].shift(-1) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D3L'] = ((dfcal.groupby(['Symbol'])['low'].shift(-2) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D3H'] = ((dfcal.groupby(['Symbol'])['high'].shift(-2) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D4L'] = ((dfcal.groupby(['Symbol'])['low'].shift(-3) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dfcal['D4H'] = ((dfcal.groupby(['Symbol'])['high'].shift(-3) - dfcal['Prev_Close'] ) / dfcal['Prev_Close']).apply(lambda x: '' if np.isnan(x) else "{:.3%}".format(x))
dffinal = pd.merge(df2, dfcal[['Symbol', 'date', 'Prev_Close', 'D1L', 'D1H', 'D2L', 'D2H', 'D3L', 'D3H', 'D4L', 'D4H']], on = ['Symbol', 'date'], how = 'left')
dffinal