Search code examples
rhierarchical-clusteringhclust

R hclust: common order for multiple trees


I am making trees using hclust. I have several distances defined on a common set. I want to find, as close to possible, a common order of the set for each distance without any crossing edges in the plot. For example, I would like to have the 1 through 5 leaves on the left side for both trees in this example.

x<-seq(1,10)
y<-c(1.3,2.4,3.6,4.9,5.2,6.9,7.9,8.7,9.6,10.1)
X<-hclust(dist(x))
Y<-hclust(dist(y))
par(mfrow=c(2,1))
plot(X)
plot(Y)

In general, is there an algorithm to find this order, if one exists? Or, at least find one order for each distance that is close to the others? I understand this can be done with dendrograms using order or sort, but I think those trees are less informative.

Plots as hclust

Plots as dendrograms


Solution

  • You could use the sort function from the dendextend library:

    library(dendextend)
    
    x<-seq(1,10)
    y<-c(1.3,2.4,3.6,4.9,5.2,6.9,7.9,8.7,9.6,10.1)
    X<-hclust(dist(x)) %>% as.dendrogram %>% sort %>% as.hclust
    Y<-hclust(dist(y)) %>% as.dendrogram %>% sort %>% as.hclust
    par(mfrow=c(2,1))
    plot( X )
    plot( Y )
    

    The function simply tries to sort the plot by the index of each leaf.