Search code examples
rtreeggtree

Colouring labels based on external dataset in ggtree


I have a phylogenetic tree in .tre format and accompanying dataset. The exact form of the tree does not matter, it is just a random phylogenetic tree. The dataset has two columns: names and colours.

When plotting such tree, I would very likely add coloured points (two different colours) to the tree from that accompanying dataset. Problem is that when I am using following piece of code:

ggtree(RANDOMTREE) + geom_tippoint(pch=16, col=RANDOMDATA$color) + geom_tiplab(offset=0.1)

it colours the points but colours have the order they have in the accompanying dataset, of course.

But I would like to match the colours based on the names of the species in a tree with the one in the dataset (they are same format, but different order). I did not figure that out, yet. Can you please help me with this?

Thank you very much.

Example code:

source("https://bioconductor.org/biocLite.R")
biocLite("ggtree")
library(ggtree)

tree<-read.tree(text="(spec1,((spec2,(spec9,(spec3,spec5))),spec8,(spec6,(spec7,spec4))));")
dataset1<-data.frame("name" = c("spec1","spec2","spec3","spec4","spec5","spec6","spec7","spec8","spec9"), "colour" = c("red","red","blue","red","red","blue","blue","red","blue"))

ggtree(tree) + geom_tiplab() + geom_tippoint(pch=16, col=as.factor(dataset1$colour))

What I get: wrongly labeled tree

What I would like to get: correctly labeled tree


Solution

  • I can get the right grouping, but not the right colour off the bat

    p <- ggtree(tree) + geom_tiplab()
    p <- p %<+% dataset1 + geom_tippoint(pch=16, aes(col=colour))
    p
    

    I used this for reference: https://aschuerch.github.io/posts/2017-04-24-blog-post-1. Package has bad documentation. You could achieve what you want by switching the "red" and the "blue" :p

    Its taking the ordering of the colours and pairing it with an inbuilt colour scale. So if the scale starts with (red, blue), and your series is (blue, red), its matching up in that order. Make sense?

    edit: Installing this package was a nightmare, if there is a simpler package like https://cran.r-project.org/web/packages/data.tree/vignettes/data.tree.html, I'd suggest trying others out. It uninstalled so many of my core packages e.g. dplyr and data.table, it has a ridiculous number of dependencies