Search code examples
rtreeigraph

How to get all leaf nodes from a directed subtree using igraph in R?


Given is a directed tree (igraph graph) constructed from a dataframe containing a symbolic edgelist:

library(igraph)
library(ggraph)

#Create a symbolic edge list.
edgelist_df <- data.frame("from" = c("A", "A", "A", "B", "B", "B", "C", "D", "D", "E", 
                                     "E", "F", "G", "G", "H", "I", "I", "J", "J", "J"),
                          "to"   = c("B", "C", "D", "E", "F", "G", "H", "I", "J", "K", 
                                     "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U"))

#Create a directed tree from this edgelist. 
graph <- graph_from_data_frame(d = edgelist_df, directed = TRUE)

Plot the tree. Here I'm using package ggraph with function ggraph.

ggraph(graph = graph, 
       layout = 'dendrogram', 
       circular = FALSE) +
  geom_edge_diagonal() +
  geom_node_point() +
  geom_node_text(aes(label = name),
                 angle = 0,
                 hjust = 1.5,
                 nudge_y = 0,
                 size = 5) +
  theme_void()

enter image description here

The question is how to return a character vector containing the names of all leaf nodes from a subtree that is specified by one node, representing the root node of that subtree. For example:

  • If node = "B", then all leaf nodes that are part of the subtree with root "B" are: "K", "L", "M", "N" and "O".
  • If node = "H", then all leaf nodes that are part of the subtree with root "H" are: "P".
  • If node = "A", then all leaf nodes that are part of the subtree with root "A" (which is the original tree) are: "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T" and "U".

Solution

  • You can define a function f with distances() and degree like below

    f <- function(g, r) {
      names(V(g))[is.finite(distances(g, r, mode = "out")) & degree(g) == 1]
    }
    

    which gives

    > f(g, "B")
    [1] "K" "L" "M" "N" "O"