Search code examples
pythonpandasscipysklearn-pandaspearson-correlation

python scipy spearman correlations


I am trying to obtain the column names from the dataframe (df) and associate them to the resulting array produced by the spearmanr correlation function. I need to associate both the column names (a-j) back to the correlation value (spearman) and the p-values (spearman_pvalue). Is there an intuitive way to perform this task?

from scipy.stats import pearsonr,spearmanr
import numpy as np
import pandas as pd

df=pd.DataFrame(np.random.randint(0,100,size= (100,10)),columns=list('abcdefghij'))

def binary(row):
    if row>=50:
        return 1
    else:
        return 0
df['target']=df.a.apply(binary)

spearman,spearman_pvalue=spearmanr(df.drop(['target'],axis=1),df.target)
print(spearman)
print(spearman_pvalue)

Solution

  • It seems you need:

    from scipy.stats import spearmanr
    
    df=pd.DataFrame(np.random.randint(0,100,size= (100,10)),columns=list('abcdefghij'))
    #print (df)
    
    #faster for binary df
    df['target'] = (df['a'] >= 50).astype(int)
    #print (df)
    
    spearman,spearman_pvalue=spearmanr(df.drop(['target'],axis=1),df.target)
    
    df1 = pd.DataFrame(spearman.reshape(-1, 11), columns=df.columns)
    #print (df1)
    
    df2 = pd.DataFrame(spearman_pvalue.reshape(-1, 11), columns=df.columns)
    #print (df2)
    
    ### Kyle, we can assign the index back to the column names for the total matrix:
    df2=df2.set_index(df.columns)
    df1=df1.set_index(df.columns)
    

    Or:

    df1 = pd.DataFrame(spearman.reshape(-1, 11), 
                      columns=df.columns, 
                      index=df.columns)
    df2 = pd.DataFrame(spearman_pvalue.reshape(-1, 11), 
                       columns=df.columns, 
                       index=df.columns)