How to fix Seaborn clustermap "condensed distance matrix must contain only finite values" error?

I have a three column csv file that I am trying to convert to a clustered heatmap. My code looks like this:

sum_mets = pd.read_csv('sum159_localization_met_magma.csv')
df5 = sum_mets[['Phenotype','Gene','P']]

clustermap5 = sns.clustermap(df5, cmap= 'inferno',  figsize=(40, 40), pivot_kws={'index': 'Phenotype', 
                                  'columns' : 'Gene',
                                  'values' : 'P'})

I then receive this ValueError:

ValueError: The condensed distance matrix must contain only finite values.

For context all of my values are non-zero. I am not sure what values is it unable to process. Thank you in advance to anyone who can help.

Solution

While you have no NaN, you need to check whether your observations are complete, because there is a pivot underneath, for example:

df = pd.DataFrame({'Phenotype':np.repeat(['very not cool','not cool','very cool','super cool'],4),
                   'Gene':["Gene"+str(i) for i in range(4)]*4,
                   'P':np.random.uniform(0,1,16)})

pd.pivot(df,columns="Gene",values="P",index="Phenotype")

Gene    Gene0   Gene1   Gene2   Gene3
Phenotype               
not cool    0.567653    0.984555    0.634450    0.406642
super cool  0.820595    0.072393    0.774895    0.185072
very cool   0.231772    0.448938    0.951706    0.893692
very not cool   0.227209    0.684660    0.013394    0.711890

The above pivots without NaN, and plots well:

sns.clustermap(df,figsize=(5, 5),pivot_kws={'index': 'Phenotype','columns' : 'Gene','values' : 'P'})

but let's say if we have 1 less observation:

df1 = df[:15]
pd.pivot(df1,columns="Gene",values="P",index="Phenotype")

Gene    Gene0   Gene1   Gene2   Gene3
Phenotype               
not cool    0.106681    0.415873    0.480102    0.721195
super cool  0.961991    0.261710    0.329859    NaN
very cool   0.069925    0.718771    0.200431    0.196573
very not cool   0.631423    0.403604    0.043415    0.373299

And it fails if you try to call clusterheatmap:

sns.clustermap(df1, pivot_kws={'index': 'Phenotype','columns' : 'Gene','values' : 'P'})
The condensed distance matrix must contain only finite values.

I suggest checking whether the missing values are intended or a mistake. So if you indeed have some missing values, you can get around the clustering but pre-computing the linkage and passing it to the function, for example using correlation below:

import scipy.spatial as sp, scipy.cluster.hierarchy as hc

row_dism = 1 - df1.T.corr()
row_linkage = hc.linkage(sp.distance.squareform(row_dism), method='complete')
col_dism = 1 - df1.corr()
col_linkage = hc.linkage(sp.distance.squareform(col_dism), method='complete')

sns.clustermap(df1,figsize=(5, 5),row_linkage=row_linkage, col_linkage=col_linkage)