Search code examples
rphylogenyape-phylo

Make 0/1 character matrix from random phylogenetic tree in R?


Is it possible to generate 0/1 character matrices like those shown below right from bifurcating phylogenetic trees like those on the left. The 1 in the matrix indicates presence of a shared character that unites the clades.

This code generates nice random trees but I have no idea where to begin to turn the results into a character matrix.

library(ape) # Other package solutions are acceptable

forest <- rmtree(N = 2, n = 10, br = NULL)
plot(forest)

To be clear, I can use the following code to generate random matrices, and then plot the trees.

library(ape)
library(phangorn)

ntaxa <- 10
nchar <- ntaxa - 1

char_mat <- array(0, dim = c(ntaxa, ntaxa - 1))

for (i in 1:nchar) {
  char_mat[,i] <- replace(char_mat[,i], seq(1, (ntaxa+1)-i), 1)
}

char_mat <- char_mat[sample.int(nrow(char_mat)), # Shuffle rows 
                     sample.int(ncol(char_mat))] # and cols

# Ensure all branch lengths > 0
dist_mat <- dist.gene(char_mat) + 0.5
upgma_tree <- upgma(dist_mat)
plot.phylo(upgma_tree, "phylo")

What I want is to generate random trees, and then make the matrices from those trees. This solution does not make the right type of matrix.

Edit for clarity: I am generating binary character matrices that students can use to draw phylogenetic trees using simple parsimony. The 1 character represents homologies that unite taxa into clades. So, all rows must share one character (a 1 across all rows in one column) and some characters must be shared by only two taxa. (I'm discounting autapomorphies.)

Examples:

enter image description here


Solution

  • I figured out how to make the matrix using Descendants from the phangorn package. I still have to tweak it with suitable node labels to match the example matrix in the original question, but the framework is there.

    library(ape)
    library(phangorn)
    
    ntaxa <- 8
    nchar <- ntaxa - 1
    
    tree <- rtree(ntaxa, br = NULL)
    
    # Gets descendants, but removes the first ntaxa elements,
    # which are the individual tips
    desc <- phangorn::Descendants(tree)[-seq(1, ntaxa)]
    
    char_mat <- array(0, dim = c(ntaxa, nchar))
    
    for (i in 1:nchar) {
      char_mat[,i] <- replace(char_mat[,i], y <- desc[[i]], 1)
    }
    
    rownames(char_mat) <- tree$tip.label
    char_mat
    #>    [,1] [,2] [,3] [,4] [,5] [,6] [,7]
    #> t6    1    1    0    0    0    0    0
    #> t3    1    1    1    0    0    0    0
    #> t7    1    1    1    1    0    0    0
    #> t2    1    1    1    1    1    0    0
    #> t5    1    1    1    1    1    0    0
    #> t1    1    0    0    0    0    1    1
    #> t8    1    0    0    0    0    1    1
    #> t4    1    0    0    0    0    1    0
    
    plot(tree)
    

    Created on 2019-01-28 by the reprex package (v0.2.1)