I have established my gene clusters and already calculated the distances needed to measure their phylogenetic relationship. I used an algorithm basically gives a measure of distance between gene clusters and is represented in a dataframe such as (Input Example):
BGC1 BGC2 Distance
------------------------------
BGC31 BGC34 0.6
BGC34 BGC45 0.7
BGC34 BGC53 0.2
BGC53 BGC31 0.8
x <- data.frame(BGC1 = c('BGC31','BGC34','BGC34','BGC35'),
BGC2 = c('BGC34','BGC45','BGC53','BGC51'),
distance = c(0.6,0.7,0.2,0.8))
Goal: Would it be possible to construct a tree just based on this type of data? I want to have a .newick file available for this as well, I'm not sure if this is possible using R though.
However, I have been able to create network visualizations from this data through Cytoscape but not possibly a tree. Any further suggestions for this particular example?
Thanks once again for your input :)
Following the suggestion in a comment by user20650 here, you can define how to wrap the distances to a dist
object using the lower.tri
function. However, the provided example will not work, because it does not provide pairwise distances between samples. The solution thus takes your sample names, generates random data and then constructs the tree with the nj
function from the ape
package.
# get all sample names
x.names = unique(c(levels(x[, 1]), levels(x[, 2])))
n = length(x.names)
# create all combinations for samples for pairwise comparisons
x2 = data.frame(t(combn(x.names, m = 2)))
# generate random distances
set.seed(4653)
x2$distance = sample(seq(from = 0.1, to = 1, by = 0.05), size = nrow(x2), replace = TRUE)
# prepare a matrix for pairwise distances
dst = matrix(NA, ncol = n, nrow = n, dimnames = list(x.names, x.names))
# fill the lower triangle with the distances obtained elsewhere
dst[lower.tri(dst)] = x2$distance
# construct a phylogenetic tree with the neighbour-joining method
library(ape)
tr = nj(dst)
plot(tr)
The newick format of the tree can be saved with ape::write.tree
function or printed to the console as:
cat(write.tree(tr))
# (BGC53:0.196875,BGC45:0.153125,(((BGC35:0.025,BGC51:0.275):0.1583333333,BGC31:0.2416666667):0.240625,BGC34:0.246875):0.003125);