I have a dataframe with dates (30/09/2022 to 31/11/2022) and 15 stock prices (wrote 5 as reference) for each of these dates (excluding weekends).
Current Data:
DATES | A | B | C | D | E |
30/09/22 |100.5|151.3|233.4|237.2|38.42|
01/10/22 |101.5|148.0|237.6|232.2|38.54|
02/10/22 |102.2|147.6|238.3|231.4|39.32|
03/10/22 |103.4|145.7|239.2|232.2|39.54|
I wanted to get the Pearson correlation matrix, so I did this:
df = pd.read_excel(file_path, sheet_name)
df=df.dropna() #Remove dates that do not have prices for all stocks
log_df = df.set_index("DATES").pipe(lambda d: np.log(d.div(d.shift()))).reset_index()
corrM = log_df.corr()
Now I want to build the Pearson Uncentered Correlation Matrix, so I have the following function:
def uncentered_correlation(x, y):
x_dim = len(x)
y_dim = len(y)
xy = 0
xx = 0
yy = 0
for i in range(x_dim):
xy = xy + x[i] * y[i]
xx = xx + x[i] ** 2.0
yy = yy + y[i] ** 2.0
corr = xy/np.sqrt(xx*yy)
return(corr)
However, I do not know how to apply this function to each possible pair of columns of the dataframe to get the correlation matrix.
try this? not elegant enough, but perhaps working for you. :)
from itertools import product
def iter_product(a, b):
return list(product(a, b))
df='your dataframe hier'
re_dict={}
iter_re=iter_product(df.columns,df.columns)
for i in iter_re:
result=uncentered_correlation(df[f'{i[0]}'],df[f'{i[1]}'])
re_dict[i]=result
re_df=pd.DataFrame(re_dict,index=[0]).stack()