I'm trying to convert my z scores (axa),(axb) values as shown below to a correlation matrix.
The individual row correlation matrix of the first row as shown below:
Are there any way I can transform my csv easily in python? After tranformation I should be able to do:
import pandas as pd
con = pd.read_csv('Data/connectivity.csv')
# Do transform here
cormat = con.corr()
round(cormat,2)
sns.heatmap(cormat);
Thank you.
Kevin
EDIT
Sorry for the edit so I forgot to mention I was going to create only one connectivity matrix (correlation matrix) by taking the average of in this example the 4 rows.
so have to do a:
import pandas as pd
con = pd.read_csv('Data/connectivity.csv')
con_average = con.mean()
tmp = con_average.set_axis(df.columns.str.split(' vs ',expand=True))
out = tmp.iloc[0].unstack()
Assuming this input:
a x a a x b a x c b x a b x b b x c c x a c x b c x c
0 1 0.315900 0.5391 0.315900 1 0.45100 0.53910 0.45100 1
1 1 0.387880 0.30813 0.387880 1 0.33133 0.30813 0.33133 1
2 1 -0.021537 -0.30658 -0.021537 1 0.68163 -0.30658 0.68163 1
3 1 -0.001746 0.11458 -0.001746 1 0.39171 0.11458 0.39171 1
You can first split
the column names into a MultiIndex:
tmp = df.set_axis(df.columns.str.split(' x ', expand=True), axis=1)
Then, if you want to build your matrix from the first row, slice it and unstack
:
out = tmp.iloc[0].unstack()
Output:
a b c
a 1 0.3159 0.5391
b 0.3159 1 0.451
c 0.5391 0.451 1
To build a new dataframe for each row, either loop:
dfs = [tmp.iloc[i].unstack() for i in range(tmp.shape[0])]
Or first stack
then split with groupby
:
dfs = [g for _, g in tmp.stack().groupby(level=0)]
Output:
[ a b c
0 a 1 0.3159 0.5391
b 0.3159 1.0000 0.4510
c 0.5391 0.4510 1.0000,
a b c
1 a 1 0.38788 0.30813
b 0.38788 1.00000 0.33133
c 0.30813, 0.33133 1.00000,
a b c
2 a 1 -0.021537 -0.30658
b -0.021537 1.000000 0.68163
c -0.30658 0.681630 1.00000,
a b c
3 a 1 -0.001746 0.11458
b -0.001746 1.000000 0.39171
c 0.11458 0.391710 1.00000]
If you want to average the rows:
out = (df.set_axis(df.columns.str.split(' x ', expand=True), axis=1)
.mean().unstack()
)
Output:
a b c
a 1.000000 0.170124 0.163808
b 0.170124 1.000000 0.463917
c 0.163808 0.463917 1.000000
Intermediate tmp
:
a b c
a b c a b c a b c
0 1 0.315900 0.5391 0.315900 1 0.45100 0.53910 0.45100 1
1 1 0.387880 0.30813 0.387880 1 0.33133 0.30813 0.33133 1
2 1 -0.021537 -0.30658 -0.021537 1 0.68163 -0.30658 0.68163 1
3 1 -0.001746 0.11458 -0.001746 1 0.39171 0.11458 0.39171 1