Vegan::meandist() has a really nice plot method that creates a dendrogram of the mean dissimilarities. How can I incorporate the output into ggplot to have full control over the aesthetics? Here is some sample code using Dune. As an example, I'd like to recreate the dendrogram in ggplot and color each Management level by 'Use' (see factors in Dune.env).
# Species and environmental data
require(vegan)
dune <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.spe.txt', row.names = 1)
dune.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.env.txt', row.names = 1)
data(dune)
data(dune.env)
dune_dist <- vegdist(dune, method = "bray", na.rm=T)
dissim <- meandist(dune_dist, grouping = dune.env$Management)
plot(dissim)
From ?vegan:::plot.meandist
it is clear hclust
function is used for kind = "dendrogram"
. To recreate:
zz <- hclust(as.dist(dissim), method = "average") #use desired method, "average" is the default in vegan:::plot.meandist
Now to visualize the tree using ggplot:
library(ggdendro)
create a data.frame from the tree:
dd <- as.dendrogram(zz)
dd <- dendro_data(zz)
get the diagonal elements from the dissimilarity matrix since they represent within-cluster variability (see @Jari Oksanens comments bellow):
data.frame(diag = diag(dissim)) %>%
rownames_to_column("label") -> dissim_diag
dissim_diag
label diag
1 BF 0.4159972
2 HF 0.4418115
3 NM 0.6882438
4 SF 0.5813015
now there is a need to change segment data so the leaves do not end at 0 but at the appropriate distance.
segment(dd)
x y xend yend
1 1.875 0.7412760 1.00 0.7412760
2 1.000 0.7412760 1.00 0.0000000
3 1.875 0.7412760 2.75 0.7412760
4 2.750 0.7412760 2.75 0.5960416
5 2.750 0.5960416 2.00 0.5960416
6 2.000 0.5960416 2.00 0.0000000
7 2.750 0.5960416 3.50 0.5960416
8 3.500 0.5960416 3.50 0.4736637
9 3.500 0.4736637 3.00 0.4736637
10 3.000 0.4736637 3.00 0.0000000
11 3.500 0.4736637 4.00 0.4736637
12 4.000 0.4736637 4.00 0.0000000
In other words where x is a whole number and yend is 0 we need to change the yend to the appropriate distance. The following code accomplishes this in two joins. First join adds the label(dd)
data and the second join adds dissim_diag
data to the segment data:
segment_data <- segment(dd) %>%
left_join(
label(dd),
by = c("xend" = "x",
"yend" = "y")) %>%
left_join(dissim_diag) %>%
mutate(yend = pmax(yend, diag, na.rm = TRUE)) #use as yend whichever is higher yend or diag, ignoring NA.
segment_data
x y xend yend label diag
1 1.875 0.7412760 1.00 0.7412760 <NA> NA
2 1.000 0.7412760 1.00 0.6882438 NM 0.6882438
3 1.875 0.7412760 2.75 0.7412760 <NA> NA
4 2.750 0.7412760 2.75 0.5960416 <NA> NA
5 2.750 0.5960416 2.00 0.5960416 <NA> NA
6 2.000 0.5960416 2.00 0.5813015 SF 0.5813015
7 2.750 0.5960416 3.50 0.5960416 <NA> NA
8 3.500 0.5960416 3.50 0.4736637 <NA> NA
9 3.500 0.4736637 3.00 0.4736637 <NA> NA
10 3.000 0.4736637 3.00 0.4159972 BF 0.4159972
11 3.500 0.4736637 4.00 0.4736637 <NA> NA
12 4.000 0.4736637 4.00 0.4418115 HF 0.4418115
A similar manipulation is needed to create appropriate label cooridnates:
text_data <- label(dd) %>%
left_join(dissim_diag) %>%
mutate(y = diag,
group = factor(rep(c("one", "two"), 2))) #just some random groups to color by
Now the actual plot:
ggplot(segment_data) +
geom_segment(aes(x = x,
y = y,
xend = xend,
yend = yend)) +
theme_dendro() +
theme(axis.line.y = element_line(),
axis.ticks.y = element_line(),
axis.text.y = element_text()) +
geom_text(aes(x = x,
y = y,
label = label,
color = group),
angle = -90, hjust = 0,
data = text_data)
Kudos to @Jari Oksanens for his comments!