Search code examples
rggplot2cluster-analysisdendrogramvegan

Plotting dendrogram from vegan::meandist() in ggplot


Vegan::meandist() has a really nice plot method that creates a dendrogram of the mean dissimilarities. How can I incorporate the output into ggplot to have full control over the aesthetics? Here is some sample code using Dune. As an example, I'd like to recreate the dendrogram in ggplot and color each Management level by 'Use' (see factors in Dune.env).


# Species and environmental data
require(vegan)

dune <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.spe.txt', row.names = 1)

dune.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.env.txt', row.names = 1)

data(dune) 
data(dune.env)

dune_dist <- vegdist(dune, method = "bray", na.rm=T)

dissim <- meandist(dune_dist, grouping = dune.env$Management) 

plot(dissim)


Solution

  • From ?vegan:::plot.meandist it is clear hclust function is used for kind = "dendrogram". To recreate:

    zz <- hclust(as.dist(dissim), method = "average") #use desired method, "average" is the default in vegan:::plot.meandist
    

    Now to visualize the tree using ggplot:

    library(ggdendro)
    

    create a data.frame from the tree:

    dd <- as.dendrogram(zz) 
    dd <- dendro_data(zz) 
    

    get the diagonal elements from the dissimilarity matrix since they represent within-cluster variability (see @Jari Oksanens comments bellow):

    data.frame(diag = diag(dissim)) %>%
      rownames_to_column("label") -> dissim_diag
    
    dissim_diag
      label      diag
    1    BF 0.4159972
    2    HF 0.4418115
    3    NM 0.6882438
    4    SF 0.5813015
    

    now there is a need to change segment data so the leaves do not end at 0 but at the appropriate distance.

    segment(dd) 
           x         y xend      yend
    1  1.875 0.7412760 1.00 0.7412760
    2  1.000 0.7412760 1.00 0.0000000
    3  1.875 0.7412760 2.75 0.7412760
    4  2.750 0.7412760 2.75 0.5960416
    5  2.750 0.5960416 2.00 0.5960416
    6  2.000 0.5960416 2.00 0.0000000
    7  2.750 0.5960416 3.50 0.5960416
    8  3.500 0.5960416 3.50 0.4736637
    9  3.500 0.4736637 3.00 0.4736637
    10 3.000 0.4736637 3.00 0.0000000
    11 3.500 0.4736637 4.00 0.4736637
    12 4.000 0.4736637 4.00 0.0000000
    

    In other words where x is a whole number and yend is 0 we need to change the yend to the appropriate distance. The following code accomplishes this in two joins. First join adds the label(dd) data and the second join adds dissim_diag data to the segment data:

    segment_data <- segment(dd) %>%
      left_join(
        label(dd),
        by = c("xend" = "x",
               "yend" = "y")) %>%
      left_join(dissim_diag) %>%
      mutate(yend = pmax(yend, diag, na.rm = TRUE)) #use as yend whichever is higher yend or diag, ignoring NA.
    
    segment_data
           x         y xend      yend label      diag
    1  1.875 0.7412760 1.00 0.7412760  <NA>        NA
    2  1.000 0.7412760 1.00 0.6882438    NM 0.6882438
    3  1.875 0.7412760 2.75 0.7412760  <NA>        NA
    4  2.750 0.7412760 2.75 0.5960416  <NA>        NA
    5  2.750 0.5960416 2.00 0.5960416  <NA>        NA
    6  2.000 0.5960416 2.00 0.5813015    SF 0.5813015
    7  2.750 0.5960416 3.50 0.5960416  <NA>        NA
    8  3.500 0.5960416 3.50 0.4736637  <NA>        NA
    9  3.500 0.4736637 3.00 0.4736637  <NA>        NA
    10 3.000 0.4736637 3.00 0.4159972    BF 0.4159972
    11 3.500 0.4736637 4.00 0.4736637  <NA>        NA
    12 4.000 0.4736637 4.00 0.4418115    HF 0.4418115
    

    A similar manipulation is needed to create appropriate label cooridnates:

    text_data <- label(dd) %>%
      left_join(dissim_diag) %>%
      mutate(y = diag,
             group = factor(rep(c("one", "two"),  2))) #just some random groups to color by
    

    Now the actual plot:

    ggplot(segment_data) + 
      geom_segment(aes(x = x,
                       y = y,
                       xend = xend,
                       yend = yend))  +
      theme_dendro() + 
      theme(axis.line.y = element_line(),
            axis.ticks.y = element_line(),
            axis.text.y = element_text()) +
      geom_text(aes(x = x,
                    y = y,
                    label = label,
                    color = group),
                angle = -90, hjust = 0,
                data = text_data) 
    

    enter image description here

    Kudos to @Jari Oksanens for his comments!