Search code examples
rbioinformaticshierarchical-clusteringdendrogramdendextend

R hclust -> dendrogram -> phylo?


I have hclust hierarchical cluster objects with hundreds of nodes and long labels. For example, synonyms of multiple genes within a family. See below.

I would like to cut the hclust into smaller subtrees and then visualize them with flexible styles. Following http://gastonsanchez.com/blog/how-to/2012/10/03/Dendrograms.html, I see how to cut dendrograms and pretty-plot ape phylogenic trees.

I just don't see any method for converting the cut dendrograms into phylo objects.

> as.phylo(as.dendrogram(hc))
Error in UseMethod("as.phylo") : 
  no applicable method for 'as.phylo' applied to an object of class "dendrogram"

I'm open to any method which would render circular or vertically oriented subtrees.

In fact, my goal is to visually detect patterns in the gene synonyms so that I can write something like mustache templates for them, so I'm even open to solutions that don't involve dendrograms. There are a handful of SO posts about multiple sequence alignments of plain text, but they go a little over my head.

> receptor.synonyms
                                 synonym
1                alpha1B-adrenergic receptor
2                                       B1AR
3              adrenergic receptor, alpha 2a
4                                  beta 3-AR
5                                 alpha-2AAR
6                                  alpha2-C4
7                                     Adrb-1
8                                       Badm
9                                  beta 1-AR
10    Adrenergic, alpha2C-, receptor class I
11                     alpha-1D adrenoceptor
12                                 beta 2-AR    
13                       adrenergic receptor
14              alpha-2A-adrenergic receptor
15  Adrenergic, alpha2B-, receptor class III
16            adrenergic, alpha 1B, receptor
17                    &alpha;<sub>2</sub>-C2
18           adrenergic, alpha-1A-, receptor
19                                   ADRARL1
20                     alpha-1B adrenoceptor
--- snip ---

enter image description here


Solution

  • Since the post you linked to was published, there has been a lot of work done on playing with hclust outputs through the dendrogram object by using the dendextend R package. For example, you can drop labels with the "prune" function, use "cutree" on the dendrogram, color the branches, and do many other things.

    You can learn more on the package from the post/journal-article: dendextend: a package for visualizing, adjusting, and comparing dendrograms (based on a paper from “bioinformatics”)

    enter image description here

    To see more advanced stuff (like circular plots and the like), you can check out the vignette of the package: Introduction to dendextend.