Search code examples
pythonpandascorrelation

Pandas apply function by columns


I have a dataframe with dates (30/09/2022 to 31/11/2022) and 15 stock prices (wrote 5 as reference) for each of these dates (excluding weekends).

Current Data:

   DATES   |  A  |  B  |  C  |  D  |  E  |
 30/09/22 |100.5|151.3|233.4|237.2|38.42|
 01/10/22 |101.5|148.0|237.6|232.2|38.54|
 02/10/22 |102.2|147.6|238.3|231.4|39.32|
 03/10/22 |103.4|145.7|239.2|232.2|39.54|

I wanted to get the Pearson correlation matrix, so I did this:

df = pd.read_excel(file_path, sheet_name)
df=df.dropna() #Remove dates that do not have prices for all stocks
log_df = df.set_index("DATES").pipe(lambda d: np.log(d.div(d.shift()))).reset_index()
corrM = log_df.corr()

Now I want to build the Pearson Uncentered Correlation Matrix, so I have the following function:

def uncentered_correlation(x, y):

    x_dim = len(x)
    y_dim = len(y)
    
    xy = 0
    xx = 0
    yy = 0
    for i in range(x_dim):
        xy = xy + x[i] * y[i]
        xx = xx + x[i] ** 2.0
        yy = yy + y[i] ** 2.0
        
    corr = xy/np.sqrt(xx*yy)
    return(corr)

However, I do not know how to apply this function to each possible pair of columns of the dataframe to get the correlation matrix.


Solution

  • try this? not elegant enough, but perhaps working for you. :)

    from itertools import product
    
    def iter_product(a, b):
        return list(product(a, b))
    
    df='your dataframe hier'
    re_dict={}
    iter_re=iter_product(df.columns,df.columns)
    for i in iter_re:    
        result=uncentered_correlation(df[f'{i[0]}'],df[f'{i[1]}'])
        re_dict[i]=result
    re_df=pd.DataFrame(re_dict,index=[0]).stack()