Search code examples
pythonpandasdataframecorrelation

Correlation with heatmap between 2 columns with different dataset in jupyter notebook


I would like to seek support pertaining to the correlation matrix for 2 different dataset and generating it to a heatmap.

Listed below is the sample data:

Expression PR Metrics
Engagement 0.33 0.70
Excitement 0.33 0.15
Focus 0.33 0.36
Interest 0.67 0.47
Relaxation 0.55 0.20
Stress 0.44 0.40

As these data are not imported from a csv file (Due to the need for modification in future), it is created via a df. And the values are converted to float using astype(float)

The way that I have created the df and converting the types are provided here.

data = {
    'Expression':['Engagement', 'Excitement', 'Focus','Interest','Relaxation','Stress'],
    'PR': ['0.33','0.33','0.33','0.67','0.55','0.44'],
    'Metrics': ['0.70','0.15','0.36','0.47','0.20','0.40']
    }

df['PR']=df['PR'].astype(float) #Converts object dtype to float
df['Emotiv Metrics']=df['Emotiv Metrics'].astype(float) #Converts object dtype to float

After which, if I were to use df.corr(), it will only provide the correlation result as shown:

                      PR        Metrics
PR              1.000000       -0.048189
Metrics        -0.048189        1.000000

However, what I would like to generate is a correlation matrix that shows the correlation between EACH expression from the PR and Metrics, as to what is provided in the snipped image, inclusive of the Metrics and PR.

enter image description here

How should I go about it in this case then?

Or if there's any error pertaining to the above code, please do point out as well.


Solution

  • Use DataFrame.dot with transpose DataFrame with seaborn.heatmap:

    import seaborn as sb
    
    df1 = df.set_index('Expression')[['PR','Metrics']]
    df = df1.dot(df1.T).rename_axis(index='name1', columns='name2')
    print (df)
    
    name2       Engagement  Excitement   Focus  Interest  Relaxation  Stress
    name1                                                                   
    Engagement      0.5989      0.2139  0.3609    0.5501      0.3215  0.4252
    Excitement      0.2139      0.1314  0.1629    0.2916      0.2115  0.2052
    Focus           0.3609      0.1629  0.2385    0.3903      0.2535  0.2892
    Interest        0.5501      0.2916  0.3903    0.6698      0.4625  0.4828
    Relaxation      0.3215      0.2115  0.2535    0.4625      0.3425  0.3220
    Stress          0.4252      0.2052  0.2892    0.4828      0.3220  0.3536
    
    sb.heatmap(df, annot=True)