Search code examples
pythonscipyseabornhierarchical-clustering

Pass distance matrix to seaborn clustermap


I want to pass my own distance matrix (row linkages) to seaborn clustermap.

There are already some posts on this like

Use Distance Matrix in scipy.cluster.hierarchy.linkage()?

But they all point to

scipy hierarchy linkage

Which takes the clustering metric and method as arguments.

scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean', optimal_ordering=False)

The input y may be either a 1d condensed distance matrix or a 2d array of observation vectors

What I dont get is this:

My distance matrix is already based on a certain metric and method, why would I want to recalculate this in scipy hierarchy linkage ?

Is there an option where it purely uses my distances and creates the linkages?


Solution

  • For posterity, here is a complete method of how to do this, as @WarrenWeckesser in the comments and @SibbsGambling in the linked answer leave out some details.

    Suppose distMatrix is your matrix of distances (don't have to be Euclidean), with entry in row i and column j representing the distance between the ith and jth objects. Then:

    # import packages
    from scipy.cluster import hierarchy
    import scipy.spatial.distance as ssd
    import seaborn as sns
    
    # define distance array as in linked answer
    distArray = ssd.squareform(distMatrix) 
    
    # define linkage object
    distLinkage = hierarchy.linkage(distArray)
    
    # make clustermap
    sns.clustermap(distMatrix, row_linkage=distLinkage, col_linkage=distLinkage)
    

    Note that when creating the clustermap, you still have to reference the original matrix. If you want to use a different clustering method, such as method='ward', include that option when defining distLinkage.