Search code examples
pythonscipyseaborndendrogram

Extract dendrogram from seaborn clustermap


Given the following example which is from: https://python-graph-gallery.com/404-dendrogram-with-heat-map/

It generates a dendrogram where I assume that it is based on scipy.

# Libraries
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt

# Data set
url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv'
df = pd.read_csv(url)
df = df.set_index('model')
del df.index.name
df

# Default plot
sns.clustermap(df)

Question: How can one get the dendrogram in non-graphical form?

Background information: From the root of that dendrogram I want to cut it at the largest length. For example we have one edge from the root to a left cluster (L) and an edge to a right cluster (R) ...from those two I'd like to get their edge lengths and cut the whole dendrogram at the longest of these two edges.

Best regards


Solution

  • clustermap returns a handle to the ClusterGrid object, which includes child objects for each dendrogram, h.dendrogram_col and h.dendrogram_row. Inside these are the dendrograms themselves, which provides the dendrogram geometry as per the scipy.hierarchical.dendrogram return data, from which you could compute the lengths of a specific branch.

    h = sns.clustermap(df)
    dgram = h.dendrogram_col.dendrogram
    D = np.array(dgram['dcoord'])
    I = np.array(dgram['icoord'])
    
    # then the root node will be the last entry, and the length of the L/R branches will be
    yy = D[-1] 
    lenL = yy[1]-yy[0]
    lenR = yy[2]-yy[3]
    

    The linkage matrix, the input used to compute the dendrogram, might also help:

    h.dendrogram_col.linkage
    h.dendrogram_row.linkage