Search code examples
pythonpandasdataframecorrelationpearson-correlation

Finding correlation in dataframe


I have a pandas dataframe(df) that has columns (say x_1,x_2,....x_n as column names). I want to find a correlation (Pearson) between the ith column and the rest of the columns.

One way I can do this is by using the .corr() function

correlation = df.corr(method='pearson')
corr_i = correlation['x_i']

but this method is bit expensive since it finds correlations between all of the columns (all I need is only one column). The other method that I could do is

corr_i =[df['x_i'].corr(df[j], method ='pearson') for j in df.columns.tolist() if j!='x_i']

but I do feel that this is not efficient way of finding correlation given the flexibility of dataframe. Can anyone help me with very efficient method than above two? Thanks in advance.


Solution

  • corrwith() might be what are looking for.

    Say you had a data frame with columns c1,c2,c3,c4.

    Then you should be able to:

    df[['c2','c3','c4']].corrwith(df['c1'])