Search code examples
rtidyversedendrogramggraphtidygraph

Propagate value from children with tidygraph


I have a tree annotated at genus level (ie each leaf has a name) and I want to propagate the color of the leaves in the branches/edges as long as the children have the same genus, like in this plot:

enter image description here

Source

My tree is here (sorry, dput doesn't work...) and he looks like that:

library(ggraph)
library(tidygraph)
load("tree_v3")

TBL %>% activate(nodes) %>% as_tibble
# A tibble: 50 x 2
    leaf      Genus
   <lgl>     <fctr>
 1 FALSE         NA
 2  TRUE Klebsiella
 3  TRUE Klebsiella
 4 FALSE         NA
 5  TRUE Klebsiella
 6  TRUE Klebsiella
 7 FALSE         NA
 8 FALSE         NA
 9  TRUE Klebsiella
10 FALSE         NA
# ... with 40 more rows

I can print the tree with this code but as you can see, the edge colors stay near the leaves.

TBL %>%
  ggraph('dendrogram') + 
  theme_bw() +
  geom_edge_diagonal2(aes(color = node.Genus)) +
  scale_edge_color_discrete(guide = FALSE) +
  geom_node_point(aes(filter = leaf, color = Genus), size = 2)

enter image description here

There is a code in the section Mapping over searches on this blog post but it doesn't work on my data and I don't understand why...

TBL2 <- TBL %>%
  activate(nodes) %>%
  mutate(Genus = map_bfs_back_chr(node_is_root(), .f = function(node, path, ...) {
    nodes <- .N()
    if (nodes$leaf[node]) return(nodes$Genus[node])
    if (anyNA(unlist(path$result))) return(NA_character_)
    path$result[[1]]
  }))

Error in mutate_impl(.data, dots) : Evaluation error: Cannot coerce values to character(1).

EDIT after Marco Sandri answer

With mutate(Genus = as.character(Genus)) there is no more error message but the Genus doesn't propagate correctly. For instance see the third and fourth nodes starting from the right: the parent is supposed to be NA... (note that it doesn't work either in the blog post plot).

enter image description here


Solution

  • Genus in TBL is a factor:

    str(TBL %>% activate(nodes) %>% as_tibble)
    
    # Classes ‘tbl_df’, ‘tbl’ and 'data.frame':       50 obs. of  2 variables:
    # $ leaf : logi  FALSE TRUE TRUE FALSE TRUE TRUE ...
    # $ Genus: Factor w/ 10 levels "","Citrobacter",..: NA 6 6 NA 6 6 NA NA 6 NA ...
    

    but should be a character.
    After converting Genus from factor to character, the code works.

    TBL2 <- TBL %>%
      activate(nodes) %>% 
      mutate(Genus = as.character(Genus)) %>%
        mutate(Species = map_bfs_back_chr(node_is_root(), .f = function(node, path, ...) {
            nodes <- .N()
            if (nodes$leaf[node]) return(nodes$Genus[node])
            if (anyNA(unlist(path$result))) return(NA_character_)
            path$result[[1]]
        }))