I have a three column csv file that I am trying to convert to a clustered heatmap. My code looks like this:
sum_mets = pd.read_csv('sum159_localization_met_magma.csv')
df5 = sum_mets[['Phenotype','Gene','P']]
clustermap5 = sns.clustermap(df5, cmap= 'inferno', figsize=(40, 40), pivot_kws={'index': 'Phenotype',
'columns' : 'Gene',
'values' : 'P'})
I then receive this ValueError:
ValueError: The condensed distance matrix must contain only finite values.
For context all of my values are non-zero. I am not sure what values is it unable to process. Thank you in advance to anyone who can help.
While you have no NaN, you need to check whether your observations are complete, because there is a pivot underneath, for example:
df = pd.DataFrame({'Phenotype':np.repeat(['very not cool','not cool','very cool','super cool'],4),
'Gene':["Gene"+str(i) for i in range(4)]*4,
'P':np.random.uniform(0,1,16)})
pd.pivot(df,columns="Gene",values="P",index="Phenotype")
Gene Gene0 Gene1 Gene2 Gene3
Phenotype
not cool 0.567653 0.984555 0.634450 0.406642
super cool 0.820595 0.072393 0.774895 0.185072
very cool 0.231772 0.448938 0.951706 0.893692
very not cool 0.227209 0.684660 0.013394 0.711890
The above pivots without NaN, and plots well:
sns.clustermap(df,figsize=(5, 5),pivot_kws={'index': 'Phenotype','columns' : 'Gene','values' : 'P'})
but let's say if we have 1 less observation:
df1 = df[:15]
pd.pivot(df1,columns="Gene",values="P",index="Phenotype")
Gene Gene0 Gene1 Gene2 Gene3
Phenotype
not cool 0.106681 0.415873 0.480102 0.721195
super cool 0.961991 0.261710 0.329859 NaN
very cool 0.069925 0.718771 0.200431 0.196573
very not cool 0.631423 0.403604 0.043415 0.373299
And it fails if you try to call clusterheatmap:
sns.clustermap(df1, pivot_kws={'index': 'Phenotype','columns' : 'Gene','values' : 'P'})
The condensed distance matrix must contain only finite values.
I suggest checking whether the missing values are intended or a mistake. So if you indeed have some missing values, you can get around the clustering but pre-computing the linkage and passing it to the function, for example using correlation below:
import scipy.spatial as sp, scipy.cluster.hierarchy as hc
row_dism = 1 - df1.T.corr()
row_linkage = hc.linkage(sp.distance.squareform(row_dism), method='complete')
col_dism = 1 - df1.corr()
col_linkage = hc.linkage(sp.distance.squareform(col_dism), method='complete')
sns.clustermap(df1,figsize=(5, 5),row_linkage=row_linkage, col_linkage=col_linkage)