Search code examples
rbioinformaticsphylogeny

Calculate sum of the total phylogenetic branch length from species birth-death table


BACKGROUND

This question is a bit complex so I will first introduce the background. To generate an example of species birth-death table (the L table) I would suggest to use dd_sim() function from DDD package.

library(DDD)
library(tidyverse)
library(picante)

result <- dd_sim(c(0.2, 0.1, 20), 10) 
# with birth rate 0.2, death rate 0.1, carrying capacity 20 and overall 10 million years.
L <- result$L

L

            [,1] [,2] [,3]       [,4]
 [1,] 10.0000000    0   -1 -1.0000000
 [2,] 10.0000000   -1    2  2.7058965
 [3,]  8.5908774    2    3  6.6301616
 [4,]  8.4786474    3    4  3.3866813
 [5,]  8.4455262   -1   -5 -1.0000000
 [6,]  8.3431071    4    6  3.5624756
 [7,]  5.3784683    2    7  0.6975934
 [8,]  3.8950593    6    8 -1.0000000
 [9,]  1.5032100   -5   -9 -1.0000000
[10,]  0.8393589    7   10 -1.0000000
[11,]  0.6118985   -5  -11 -1.0000000

The L table has 4 columns:

  1. the first column is the time at which a species is born in Mya
  2. the second column is the label of the parent of the species; positive and negative values indicate whether the species belongs to the left or right crown lineage
  3. the third column is the label of the daughter species itself; positive and negative values indicate whether the species belongs to the left or right crown lineage
  4. the fourth column is the time of extinction of the species; if the fourth element equals -1, then the species is still extant.

WHAT I DID

With this L table, I now have a extant community. I want to calculate its phylogenetic diversity (also called Faith's index)

Using DDD and picante functions I can do it:

# convert L table into community data matrix
comm = as.data.frame(L) %>% dplyr::select(V3:V4) %>%
dplyr::rename(id = V3, pa = V4) %>%
dplyr::mutate(id = paste0("t",abs(id))) %>%
dplyr::mutate(pa = dplyr::if_else(pa == -1, 1, 0)) %>%
dplyr::mutate(plot = 0) %>%
dplyr::select(plot, pa, id) %>%
picante::sample2matrix()

# convert L table into phylogeny
phy = DDD::L2phylo(L, dropextinct = T)

# calculate Faith's index using pd() function
Faith = picante::pd(comm,phy)

PROBLEM

Although I achieved my goal, the procedure seems to be redundant and time-consuming. I have to convert my original L table back and forth because I have to use existing functions.

By definition Faith's index is basically the sum of the total phylogenetic branch length of the community, so my question is:

Is it possible to calculate Faith's index directly from the L table?

Thank you advance!


Solution

  • You can simply use the phy$edge.length component of the phylo object generated by DDD::L2phylo:

    ## Measuring the sum of the branch lengths from `phy`
    sum_br_length <- sum(phy$edge.length)
    sum_br_length == Faith$PD
    # [1] TRUE
    
    ## Measuring the sum of the branch length from `L`
    sum_br_length <- sum(DDD::L2phylo(L, dropextinct = TRUE)$edge.length)
    sum_br_length == Faith$PD
    # [1] TRUE
    

    And some micro-benchmarking for fun:

    library(microbenchmark)
    ## Function 1
    fun1 <- function(L) {
        comm = as.data.frame(L) %>% dplyr::select(V3:V4) %>%
        dplyr::rename(id = V3, pa = V4) %>%
        dplyr::mutate(id = paste0("t",abs(id))) %>%
        dplyr::mutate(pa = dplyr::if_else(pa == -1, 1, 0)) %>%
        dplyr::mutate(plot = 0) %>%
        dplyr::select(plot, pa, id) %>%
        picante::sample2matrix()
    
        # convert L table into phylogeny
        phy = DDD::L2phylo(L, dropextinct = T)
    
        # calculate Faith's index using pd() function
        Faith = picante::pd(comm,phy)
        return(Faith$PD)
    }
    ## Function 2
    fun2 <- function(L) {
        phy <- DDD::L2phylo(L, dropextinct = T)
        return(sum(phy$edge.length))
    }
    ## Function 3
    fun3 <- function(L) {
        return(sum(DDD::L2phylo(L, dropextinct = TRUE)$edge.length))
    }
    
    ## Do all of them give the same results
    fun1(L) == Faith$PD
    # [1] TRUE
    fun2(L) == Faith$PD
    # [1] TRUE
    fun3(L) == Faith$PD
    # [1] TRUE
    
    ## Which function fastest?
    microbenchmark(fun1(L), fun2(L), fun3(L))
    # Unit: milliseconds
    #     expr      min       lq     mean   median       uq       max neval
    #  fun1(L) 6.486462 6.900641 8.273386 7.445334 8.667535 16.888429   100
    #  fun2(L) 1.627854 1.683204 2.215531 1.771219 2.229408  9.522366   100
    #  fun3(L) 1.630635 1.663181 2.229206 1.859733 2.448196  7.573001   100