Search code examples
pythonexcellinear-regressioncorrelationpearson-correlation

How to find correlation coefficient r in excel sheet using pandas


I have an Excel worksheet which is 250 rows by 10 column of data. My dependent variable is n_nnld_trp and I am trying to find which independent variables are highly correlated with it to use in a linear regression model. enter image description here

I want to make a table like this to summarize the correlation data as well as identify any cases of multi-collinearity using the equation in the picture: enter image description here

So far, I managed to use a pivot table to get the mean of each row with my dependent variable being n_hhld_trp:

trip_mean = pd.pivot_table(read_excel, index=['n_hhld_trip'],  
                aggfunc=np.mean)

print(trip_mean.head())

I'm finding it hard to make the table of correlation as shown above and I would welcome and appreciate any help.


Solution

  • After a few hours of digging around I got it to how I wanted to present it. For anyone wanting to do something similar, see code below:

    import pandas as pd
    pd.set_option('display.max_rows', 500)
    pd.set_option('display.max_columns', 500)
    pd.set_option('display.width', 1000)
    pearson_correlation = read_excel.corr(method='pearson')
    print(pearson_correlation)
    

    enter image description here