Search code examples
matplotlibscikit-learnlabeldendrogram

How to color a dendrogram's labels according to defined groups? (in python)


I would like to generate the labels of the plot in the same color of the groups. How should I do it?

Simple example test:

import numpy as np
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt


mat = np.array([[1.0,  0.5,  0.0],
                [0.5,  1.0, -0.5],
                [1.0, -0.5,  0.5],
                [0.0,  0.5, -0.5]])

dist_mat = mat
linkage_matrix = linkage(dist_mat, "single")

plt.clf()

ddata = dendrogram(linkage_matrix, color_threshold=0.8)

The previous code generates this plot:

enter image description here

but I want the 0 and 2 index in blue and the 1 and 3 in red.


Solution

  • import numpy as np
    from scipy.cluster.hierarchy import dendrogram, linkage
    import matplotlib.pyplot as plt
    
    
    mat = np.array([[1.0, 0.5, 0.0], [0.5, 1.0, -0.5], [1.0, -0.5, 0.5], [0.0, 0.5, -0.5]])
    
    dist_mat = mat
    linkage_matrix = linkage(dist_mat, "single")
    
    # plt.clf()
    
    ddata = dendrogram(linkage_matrix, color_threshold=0.8)
    
    # We get the color of leaves from the scipy dendogram docs
    # The key is called "leaves_color_list". We iterate over the list of these colors and set colors for our leaves
    # Please note that this parameter ("leaves_color_list") is different from the "color_list" which is the color of links
    # (as shown in the picture)
    # For the latest names of these parameters, please refer to scipy docs
    # https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.dendrogram.html
    for leaf, leaf_color in zip(plt.gca().get_xticklabels(), ddata["leaves_color_list"]):
        leaf.set_color(leaf_color)
    plt.show()
    

    The output is as shown below. The difference between parameters (color_list and leaves_color_list) have been highlighted to show the difference. enter image description here