Search code examples
pythonseabornhierarchical-clusteringdendrogram

Dendogram Coloring by groups


I created a heatmap based on spearman's correlation matrix using seaborn clustermap as folowing: I want to paint the dendrogram. I want the dendrogram to look like this: dendrogram but on the heatmap

I created a dict of colors as folowing and got an error:

def assign_tree_colour(name,val_dict,coding_names_df):
ret = None
if val_dict.get(name, '') == 'Group 1':
    ret = "(0,0.9,0.4)"   #green
elif val_dict.get(name, '') == 'Group 2':
    ret = "(0.6,0.1,0)"   #red
elif val_dict.get(name, '') == 'Group 3':
    ret = "(0.3,0.8,1)"   #light blue
elif val_dict.get(name, '') == 'Group 4':
    ret = "(0.4,0.1,1)"   #purple
elif val_dict.get(name, '') == 'Group 5':
    ret = "(1,0.9,0.1)"   #yellow
elif val_dict.get(name, '') == 'Group 6':
    ret = "(0,0,0)"   #black
else:
    ret = "(0,0,0)"         #black
return ret

def fix_string(str):
    return str.replace('"', '')

external_data3 = [list(z) for z in coding_names_df.values]
external_data3 = {fix_string(z[0]): z[3] for z in external_data3}

tree_label = list(df.index)
tree_label = [fix_string(x) for x in tree_label]
tree_labels = { j : tree_label[j] for j in range(0, len(tree_label) ) }

tree_colour = [assign_tree_colour(label, external_data3, coding_names_df) for label in tree_labels]
tree_colors = { i : tree_colour[i] for i in range(0, len(tree_colour) ) }


sns.set(color_codes=True)
sns.set(font_scale=1)
g = sns.clustermap(df, cmap="bwr",
                   vmin=-1, vmax=1,
                   yticklabels=1, xticklabels=1,
                   cbar_kws={"ticks":[-1,-0.5,0,0.5,1]},
                   figsize=(13,13),
                   row_colors=row_colors,
                   col_colors=col_colors,
                   method='average',
                   metric='correlation',
                   tree_kws=dict(colors=tree_colors))
g.ax_heatmap.set_xlabel('Genus')
g.ax_heatmap.set_ylabel('Genus')
for label in Group.unique():
    g.ax_col_dendrogram.bar(0, 0, color=lut[label],
                            label=label, linewidth=0)
g.ax_col_dendrogram.legend(loc=9, ncol=7, bbox_to_anchor=(0.26, 0., 0.5, 1.5))
ax=g.ax_heatmap



 File "<ipython-input-64-4bc6be89afe3>", line 11, in <module>
tree_kws=dict(colors=tree_colors))



File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1391, in clustermap
    tree_kws=tree_kws, **kwargs)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1208, in plot
    tree_kws=tree_kws)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1054, in plot_dendrograms
    tree_kws=tree_kws

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 776, in dendrogram
    return plotter.plot(ax=ax, tree_kws=tree_kws)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 692, in plot
    **tree_kws)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\collections.py", line 1316, in __init__
    colors = mcolors.to_rgba_array(colors)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 294, in to_rgba_array
    result[i] = to_rgba(cc, alpha)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 177, in to_rgba
    rgba = _to_rgba_no_colorcycle(c, alpha)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 240, in _to_rgba_no_colorcycle
    raise ValueError("Invalid RGBA argument: {!r}".format(orig_c))

ValueError: Invalid RGBA argument: 0

Any help on this would be greatly appreciated! Tnx!


Solution

  • According to sns.clustermap documentation, the dendrogram coloring can be set through tree_kws (takes a dict) and its colors attribute which expects a list of RGB tuples such as (0.5, 0.5, 1). It seems also that colors supports nothing except RGB tuple format data.

    Did you notice that clustermap supports nested lists or data frames for hierarchical colorbars in between dendrograms and the correlation matrix? They could be useful if the dendrograms get too crowded.

    I hope this helps!

    Edit

    The list of RGB is the sequence of line colors in LineCollection — it uses the sequence as it draws each line in both dendrograms. (The order seems that the order starts from the rightmost branch of the column dendrogram) In order to associate a certain label with a data point, you need to figure out the drawing order of data points in dendrograms.

    Edit II

    Here's a minimal example for coloring the tree based on sns.clustermap examples:

    import matplotlib.pyplot as plt
    import seaborn as sns; sns.set(color_codes=True)
    import pandas as pd
    
    
    iris = sns.load_dataset("iris")
    species = iris.pop("species")
    g = sns.clustermap(iris)
    lut = dict(zip(species.unique(), "rbg"))
    row_colors = species.map(lut)
    # For demonstrating the hierarchical sidebar coloring
    df_colors = pd.DataFrame(data={'r': row_colors[row_colors == 'r'], 'g': row_colors[row_colors == 'g'], 'b': row_colors[row_colors == 'b']}) 
    # Simple class RGBA colormap
    colmap = {'setosa': (1, 0, 0, 0.7), 'virginica': (0, 1, 0, 0.7), 'versicolor': (0, 0, 1, 0.7)}
    g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[colmap[s] for s in species]})
    plt.savefig('clustermap.png')
    

    clustermap.png As you can see, the order of the drawn lines of the tree start from the upper right corner of the image thus not being tied to the order of the data points visualized in clustermap. On the other hand, the color bars (controlled by {row,col}_colors attributes) could be used for that purpose.