Search code examples
rggplot2ggtree

Coloring lines in tanglegram based on position of nodes


I am creating tanglegrams with the following code:

library(ggtree)
library(ape)

tree1 <- read.tree(text='(((A:4.2,B:4.2):3.1,C:7.3):6.3,D:13.6);')
tree2 <- read.tree(text='(((B:4.2,A:4.2):3.1,C:7.3):6.3,D:13.6);')

p1 <- ggtree(tree1)
p2 <- ggtree(tree2)

d1 <- p1$data
d2 <- p2$data

d2$x <- max(d2$x) - d2$x + max(d1$x) + 1

pp <- p1 + geom_tree(data=d2)

dd <- bind_rows(d1, d2) %>% 
  filter(!is.na(label))

final_plot <- pp + geom_line(aes(x, y, group=label), data=dd, color='grey')

What I want to do is to color the lines based on the position of the nodes. In other words, if the line is straight, meaning that they have the same position in both trees, the color should be x, while if they have changed, it should be y.

Something like this: enter image description here

It would also be nice to get a legend for this to explain the colors.


Solution

  • You can construct a column in dd that checks if the line will be horizontal. Here I grouped by label and checked whether the number of unique id's is 1. Then you use that column to the color argument in the aes of the line.

    dd <- dd %>% group_by(label) %>% mutate(is.horiz = n_distinct(node) == 1)
    pp + 
      geom_line(aes(x, y, group=label, color = is.horiz), data=dd) +
      scale_color_manual(values = c('TRUE' = "lightblue", 'FALSE' = "purple")) +
      theme(legend.position = c(.9,.9)) +
      labs(color = 'Horizontal Nodes')
    

    enter image description here

    You can play around with the colors of the lines and the names of everything.