Search code examples
rdendrogramhclustpheatmap

Merging multiple hclust objects (or dendrograms)


Is there an easy way to merge multiple hclust objects (or dendrograms) at the root?

I've made the example as complete as possible to illustrate my problem.

Let's say I'd like to cluster USArrests by region and then unite all the hclust objects to plot them together in a heatmap.

USArrests
Northeast <- c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", 
"Vermont", "New Jersey", "New York", "Pennsylvania")
Midwest <-  c("Illinois", "Indiana", "Michigan", "Ohio",  "Wisconsin", 
    "Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", 
    "South Dakota")
South <- c("Delaware", "Florida", "Georgia", "Maryland", "North Carolina", 
           "South Carolina", "Virginia", "West Virginia", 
           "Alabama", "Kentucky", "Mississippi", "Tennessee", "Arkansas", 
           "Louisiana", "Oklahoma", "Texas")
West <- c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", 
          "Utah", "Wyoming", "Alaska", "California", "Hawaii", "Oregon", "Washington")

h1 <- hclust(dist(USArrests[Northeast,]))
h2 <- hclust(dist(USArrests[Midwest,]))
h3 <- hclust(dist(USArrests[South,]))
h4 <- hclust(dist(USArrests[West,]))

Now I have 4 hclust objects (h1 through h4). I usually merge them like so:

hc <- as.hclust(merge(merge(merge(
    as.dendrogram(h1), as.dendrogram(h2)), as.dendrogram(h3)), 
    as.dendrogram(h4)))

Then, to plot them, I have to reorder the matrix according to the hclust objects, then plot (I've added some annotation to make the plot clearer):

usarr <- USArrests[c(Northeast, Midwest, South, West),]

region_annotation <- data.frame(Region = c(rep("Northeast", length(Northeast)), 
                                rep("Midwest", length(Midwest)),
                                rep("South", length(South)),
                                rep("West", length(West))),
                                row.names = c(Northeast, Midwest, South, West))

pheatmap(usarr, cluster_rows = hc, 
         annotation_row = region_annotation)

Heatmap results, with some additional graphical parameters for beauty's sake

In summary: is there an easier way to do this than merging all the separate hclusts?


Solution

  • I ended up making a couple of functions to do this more automatically. (In my version, I've also added support for correlation "distance", so it's a little bigger)

    hclust_semisupervised <- function(data, groups, dist_method = "euclidean",
                                      dist_p = 2, hclust_method = "complete") {
        hclist <- lapply(groups, function (group) {
            hclust(dist(data[group,], method = dist_method, p = dist_p), method = hclust_method)
        })
        hc <- .merge_hclust(hclist)
        data_reordered <- data[unlist(groups),]
    
        return(list(data = data_reordered, hclust = hc))
    }
    
    .merge_hclust <- function(hclist) {
        #-- Merge
        d <- as.dendrogram(hclist[[1]])
        for (i in 2:length(hclist)) {
            d <- merge(d, as.dendrogram(hclist[[i]]))
        }
        as.hclust(d)
    }
    

    Having USArrests and the region vectors, I call hclust_semisupervised like so:

    semi_hc <- hclust_semisupervised(USArrests, list(Northeast, Midwest, South, West)
    

    Now plotting the heatmap:

    pheatmap(semi_hc$data, cluster_rows = semi_hc$hclust, 
             annotation_row = region_annotation)