Search code examples
rdendrogramhclust

How to print the order of hierarchical clustering in R?


Using the following matrix of distances between 6 Italian cities:

0   662 877 255 412 996
662 0   295 468 268 400
877 295 0   754 564 138
255 468 754 0   219 869
412 268 564 219 0   669
996 400 138 869 669 0

Will R output the order of which it clustered them in: For example, single-linkage would tell you:

City 3 and City 6, followed by
City 4 and City 5, followed by
City 1 to City 4 and City 5, finally City 2 to City 3 and City 6.

It is important that I get a numeric output rather than read it off a dendrogram.


Solution

  • I don't know a complete solution for your problem but maybe you could use the merge value returned by hclust.

    From ?hclust:

    merge: an n-1 by 2 matrix. Row i of ‘merge’ describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. Thus negative entries in ‘merge’ indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.

    Your example:

    d <- as.dist(read.table(textConnection("
    0   662 877 255 412 996
    662 0   295 468 268 400
    877 295 0   754 564 138
    255 468 754 0   219 869
    412 268 564 219 0   669
    996 400 138 869 669 0")))
    
    hc <- hclust(d, method="single")
    
    plot(hc)
    

    hcplot

    hc$merge
    
    #     [,1] [,2]  # from bottom up
    #[1,]   -3   -6  # City 3 and 6
    #[2,]   -4   -5  # City 4 and 5
    #[3,]   -1    2  # join City 1 and City 4/5
    #[4,]   -2    3  # join City 2 and City 1/4/5
    #[5,]    1    4  # join City 3/6 and City 1/2/4/5