Search code examples
pythonpandascsvmatrixcorrelation

Convert column pairwise individual row data to correlation matrix


I'm trying to convert my z scores (axa),(axb) values as shown below to a correlation matrix.

enter image description here

The individual row correlation matrix of the first row as shown below:

enter image description here

Are there any way I can transform my csv easily in python? After tranformation I should be able to do:

import pandas as pd
con = pd.read_csv('Data/connectivity.csv')
# Do transform here
cormat = con.corr()
round(cormat,2)
sns.heatmap(cormat);

Thank you.

Kevin

EDIT

Sorry for the edit so I forgot to mention I was going to create only one connectivity matrix (correlation matrix) by taking the average of in this example the 4 rows.

so have to do a:

import pandas as pd
con = pd.read_csv('Data/connectivity.csv')
con_average = con.mean()
tmp = con_average.set_axis(df.columns.str.split(' vs ',expand=True))
out = tmp.iloc[0].unstack()

Solution

  • Assuming this input:

       a x a     a x b     a x c     b x a  b x b    b x c    c x a    c x b  c x c
    0      1  0.315900    0.5391  0.315900      1  0.45100  0.53910  0.45100      1
    1      1  0.387880   0.30813  0.387880      1  0.33133  0.30813  0.33133      1
    2      1 -0.021537  -0.30658 -0.021537      1  0.68163 -0.30658  0.68163      1
    3      1 -0.001746   0.11458 -0.001746      1  0.39171  0.11458  0.39171      1
    

    You can first split the column names into a MultiIndex:

    tmp = df.set_axis(df.columns.str.split(' x ', expand=True), axis=1)
    

    Then, if you want to build your matrix from the first row, slice it and unstack:

    out = tmp.iloc[0].unstack()
    

    Output:

            a       b       c
    a       1  0.3159  0.5391
    b  0.3159       1   0.451
    c  0.5391   0.451       1
    

    To build a new dataframe for each row, either loop:

    dfs = [tmp.iloc[i].unstack() for i in range(tmp.shape[0])]
    

    Or first stack then split with groupby:

    dfs = [g for _, g in tmp.stack().groupby(level=0)]
    

    Output:

    [          a       b       c
     0 a       1  0.3159  0.5391
       b  0.3159  1.0000  0.4510
       c  0.5391  0.4510  1.0000,
                 a        b        c
     1 a         1  0.38788  0.30813
       b   0.38788  1.00000  0.33133
       c  0.30813,  0.33133  1.00000,
                 a         b        c
     2 a         1 -0.021537 -0.30658
       b -0.021537  1.000000  0.68163
       c  -0.30658  0.681630  1.00000,
                 a         b        c
     3 a         1 -0.001746  0.11458
       b -0.001746  1.000000  0.39171
       c   0.11458  0.391710  1.00000]
    

    If you want to average the rows:

    out = (df.set_axis(df.columns.str.split(' x ', expand=True), axis=1)
             .mean().unstack()
          )
    

    Output:

              a         b         c
    a  1.000000  0.170124  0.163808
    b  0.170124  1.000000  0.463917
    c  0.163808  0.463917  1.000000
    

    Intermediate tmp:

       a                             b                    c            
       a         b         c         a  b        c        a        b  c
    0  1  0.315900    0.5391  0.315900  1  0.45100  0.53910  0.45100  1
    1  1  0.387880   0.30813  0.387880  1  0.33133  0.30813  0.33133  1
    2  1 -0.021537  -0.30658 -0.021537  1  0.68163 -0.30658  0.68163  1
    3  1 -0.001746   0.11458 -0.001746  1  0.39171  0.11458  0.39171  1