Search code examples
rcluster-analysisbioinformaticsphylogenydistance-matrix

Create Phylogeny Tree (Newick File) from Distance Matrix?


I have established my gene clusters and already calculated the distances needed to measure their phylogenetic relationship. I used an algorithm basically gives a measure of distance between gene clusters and is represented in a dataframe such as (Input Example):

BGC1      BGC2     Distance
------------------------------ 
BGC31     BGC34     0.6
BGC34     BGC45     0.7
BGC34     BGC53     0.2
BGC53     BGC31     0.8

x <- data.frame(BGC1 = c('BGC31','BGC34','BGC34','BGC35'), 
                BGC2 = c('BGC34','BGC45','BGC53','BGC51'), 
                distance = c(0.6,0.7,0.2,0.8))

Goal: Would it be possible to construct a tree just based on this type of data? I want to have a .newick file available for this as well, I'm not sure if this is possible using R though.

However, I have been able to create network visualizations from this data through Cytoscape but not possibly a tree. Any further suggestions for this particular example?

Thanks once again for your input :)


Solution

  • Following the suggestion in a comment by user20650 here, you can define how to wrap the distances to a dist object using the lower.tri function. However, the provided example will not work, because it does not provide pairwise distances between samples. The solution thus takes your sample names, generates random data and then constructs the tree with the nj function from the ape package.

    # get all sample names
    x.names = unique(c(levels(x[, 1]), levels(x[, 2])))
    n = length(x.names)
    
    # create all combinations for samples for pairwise comparisons
    x2 = data.frame(t(combn(x.names, m = 2)))
    # generate random distances
    set.seed(4653)
    x2$distance = sample(seq(from = 0.1, to = 1, by = 0.05), size = nrow(x2), replace = TRUE)
    
    # prepare a matrix for pairwise distances
    dst = matrix(NA, ncol = n, nrow = n, dimnames = list(x.names, x.names))
    # fill the lower triangle with the distances obtained elsewhere
    dst[lower.tri(dst)] = x2$distance
    
    # construct a phylogenetic tree with the neighbour-joining method
    library(ape)
    tr = nj(dst)
    plot(tr)
    

    enter image description here

    The newick format of the tree can be saved with ape::write.tree function or printed to the console as:

    cat(write.tree(tr))
    # (BGC53:0.196875,BGC45:0.153125,(((BGC35:0.025,BGC51:0.275):0.1583333333,BGC31:0.2416666667):0.240625,BGC34:0.246875):0.003125);