I would like to build a "classical" circular dendrogram plot, such as those displayed in the answer of this post https://biology.stackexchange.com/questions/14152/how-should-i-put-a-large-phylogeny-into-a-scientific-paper using R (I am fairly used to R, but I have never done such plot until now).
I am using environmental metabarcoding datas (Multiple Operation Taxonomic Units, MOTUs). These are DNA sequences, each being identified to a certain taxonomic level, and assigned to it. The resulting data I am considering to build such a plot is the taxonomic "path" of each MOTU, hence the sequence of taxonomic assignation, e.g kingdom;order;class;family;genus;species (example below).
I have tried to different ways of building this plot using igraph
package, both not quite satisfying. Big parts of this code are base on the ggtaxplot
function from ```metabaR`` package (https://github.com/metabaRfactory/metabaR) package.
I think that I could get what I want from both methods, but I'm not sure any of them is really the effective way of doing it. Any help greatly appreciated !
I am not strictly willing to keep on using igraph, I would be fine with any other ggplot-like method (as I wish to add further information, e.g. labels, colors, on the plot).
Many thanks !
All this code were run on R 4.2.1 on Ubuntu 22.04.2
path <- c("root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Fungi incertae sedis@no rank:Mucoromycota@phylum:Mortierellomycotina@subphylum:Mortierellomycetes@class:Mortierellales@order:Mortierellaceae@family:Mortierella@genus:unclassified Mortierella@no rank",
"root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Dikarya@subkingdom:Ascomycota@phylum:saccharomyceta@clade:Pezizomycotina@subphylum:leotiomyceta@clade:sordariomyceta@clade:Sordariomycetes@class:Xylariomycetidae@subclass:Xylariales@order:unclassified Xylariales@no rank:Xylariales sp.@species",
"root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Fungi incertae sedis@no rank:Mucoromycota@phylum:Mortierellomycotina@subphylum:Mortierellomycetes@class:Mortierellales@order:Mortierellaceae@family:Linnemannia@genus:Linnemannia zychae@species",
"root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Fungi incertae sedis@no rank:Mucoromycota@phylum:Mortierellomycotina@subphylum:Mortierellomycetes@class:Mortierellales@order:Mortierellaceae@family",
"root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Dikarya@subkingdom:Ascomycota@phylum:saccharomyceta@clade:Pezizomycotina@subphylum:leotiomyceta@clade:sordariomyceta@clade:Sordariomycetes@class:Hypocreomycetidae@subclass:Hypocreales@order",
"root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Dikarya@subkingdom:Ascomycota@phylum:saccharomyceta@clade:Pezizomycotina@subphylum:leotiomyceta@clade:sordariomyceta@clade:Leotiomycetes@class:Helotiales@order",
"root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Dikarya@subkingdom:Ascomycota@phylum:saccharomyceta@clade:Pezizomycotina@subphylum:leotiomyceta@clade:dothideomyceta@clade:Dothideomycetes@class:Pleosporomycetidae@subclass:Pleosporales@order",
"root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Dikarya@subkingdom:Ascomycota@phylum:saccharomyceta@clade:Pezizomycotina@subphylum:leotiomyceta@clade:sordariomyceta@clade:Leotiomycetes@class:Helotiales@order"
)
I have tried to different ways of building this plot using igraph
package, both not quite satisfying. Big parts of this code are base on the ggtaxplot
function from ```metabaR`` package (https://github.com/metabaRfactory/metabaR) package :
library(metabaR)
library(magrittr)
library(igraph)
library(ggraph)
# Formatting taxonomic information from path
sep.level = ":"; sep.info = "@"
parse <- unname(taxoparser(path, sep.level, sep.info)) %>%
lapply(X =., FUN = function(x){ x <- x[names(x) %in% c("kingdom", "phylum", "class", "order", "family", "genus", "species")]})
path <- sapply(parse, toString)
parse <- strsplit(path, ", ")
parse.mat <- do.call(rbind, lapply(parse, `length<-`, max(lengths(parse))))
# Building edgelist
edgelist <- NULL
for (i in rev(2:ncol(parse.mat))) {
idx <- which(!is.na(parse.mat[, i]))
kid <- parse.mat[idx, i]
parent <- parse.mat[idx, (i - 1)]
kidfull <- apply(parse.mat[idx, 1:i, drop = F], 1, toString)
parentfull <- apply(parse.mat[idx, 1:(i - 1), drop = F], 1, toString)
edgelist <- rbind(
edgelist,
unique(cbind(
parentfull,
kidfull,
parent, kid
))
)
}
Here come my first attempt, using ggraph :
# first attempt, using ggraph
mygraph <- graph_from_data_frame( edgelist[,c("parent", "kid")] )
ggraph(mygraph, layout = 'dendrogram', circular = TRUE) +
geom_edge_elbow() +
geom_node_point() +
theme_void()
I like the "look" of this plot, but all leaves of the dendrogram end up at the same y value, whereas there is different taxonomic levels, which I would like to show.
#second attemp, using igraph
g <- igraph::graph.edgelist(edgelist[rev(1:nrow(edgelist)),
c("parentfull", "kidfull")], directed = F)
# Rename nodes with kid names
igraph::V(g)$name2 <- ifelse(igraph::V(g)$name %in% edgelist[, "parent"],
igraph::V(g)$name,
edgelist[match(igraph::V(g)$name, edgelist[, "kidfull"]), "kid"]
)
# extracting info from igraph object
coords <- layout_as_tree(g, root=1, circular = F, rootlevel = numeric(), mode = "out", flip.y = TRUE)
colnames(coords) <- c("x", "y")
vdf <- data.frame(as.data.frame(get.vertex.attribute(g)), coords)
#vdf$angle <- { 90 - as.numeric(rownames(vdf)) * 360 / nrow(vdf)}
# add segments
edf <- get.data.frame(g)
edf$from.x <- vdf$x[match(edf$from, as.vector(vdf$name))]
edf$from.y <- vdf$y[match(edf$from, as.vector(vdf$name))]
edf$to.x <- vdf$x[match(edf$to, as.vector(vdf$name))]
edf$to.y <- vdf$y[match(edf$to, as.vector(vdf$name))]
ggplot(data = vdf, aes(x = .data$x, y = -.data$y)) +
geom_segment(
data = edf,
aes(
x = .data$from.x, xend = .data$to.x,
y = -.data$from.y, yend = -.data$to.y
), size = 0.2, colour = "grey"
) +
geom_point() +
scale_color_viridis_c() +
theme_void() +
coord_polar() +
geom_text(aes(label = .data$name2),
color = "darkgrey", show.legend = FALSE, hjust = 1)
Here, the structure of the plot is fine, but I don't like the "networky" look of the plot (curve lines), which in my opinion make the relationship between levels unclear.
I don't think you should give up on the ggraph
approach. Your main issue seems to be that the leaf nodes you had were plotted at the wrong depth. However, it is straighforward to calculate the node depth and pass that to the height
argument when creating the plot layout. To make the image a bit clearer, I have added in circles to indicate the toxonomic levels:
library(tidygraph)
as_tbl_graph(mygraph) %>%
mutate(depth = node_distance_from(which(node_is_root()))) %>%
ggraph(layout = 'dendrogram', height = -depth, circular = TRUE) +
ggforce::geom_circle(aes(x0 = 0, y0 = 0 , r = r), color = 'gray90',
data = data.frame(r = seq(0, 1, 1/6))) +
geom_edge_elbow() +
geom_node_point() +
theme_void()