Search code examples
pythontreebioinformaticsphylogeny

Constructing a phylogentic tree


I have a list of list of lists like this

matches = [[['rootrank', 'Root'], ['domain', 'Bacteria'], ['phylum', 'Firmicutes'], ['class', 'Clostridia'], ['order', 'Clostridiales'], ['family', 'Lachnospiraceae'], ['genus', 'Lachnospira']], 
           [['rootrank', 'Root'], ['domain', 'Bacteria'], ['phylum', '"Proteobacteria"'], ['class', 'Gammaproteobacteria'], ['order', '"Vibrionales"'], ['family', 'Vibrionaceae'], ['genus', 'Catenococcus']], 
           [['rootrank', 'Root'], ['domain', 'Archaea'], ['phylum', '"Euryarchaeota"'], ['class', '"Methanomicrobia"'], ['order', 'Methanomicrobiales'], ['family', 'Methanomicrobiaceae'], ['genus', 'Methanoplanus']]]

And I want to construct a phylogenetic tree from them. I wrote a node class like so (based partially on this code):

class Node(object):
    """Generic n-ary tree node object
    Children are additive; no provision for deleting them."""

    def __init__(self, parent, category=None, name=None):
        self.parent = parent
        self.category = category
        self.name = name
        self.childList = []

        if  parent is None:
            self.birthOrder  =  0
        else:
            self.birthOrder  =  len(parent.childList)
            parent.childList.append(self)

    def fullPath(self):
        """Returns a list of children from root to self"""
        result  =  []
        parent  =  self.parent
        kid     =  self

        while parent:
            result.insert(0, kid)
            parent, kid  =  parent.parent, parent

        return result

    def ID(self):
        return '{0}|{1}'.format(self.category, self.name)

And then I try to construct my tree like this:

node = None
for match in matches:
    for branch in match:
        category, name = branch
        node = Node(node, category, name)
        print [n.ID() for n in node.fullPath()]

This works for the first match, but when I start with the second match it is appended at the end of the tree instead of starting again at the top. How would I do that? I tried some variations on searching for the ID, but I can't get it to work.


Solution

  • The issue is that node is always the bottommost node in the tree, and you are always appending to that node. You need to store the root node. Since ['rootrank', 'Root'] appears at the beginning of each of the lists, I'd recommend pulling that out and using it as the root. So you can do something like:

    rootnode = Node(None, 'rootrank', 'Root')
    for match in matches:
        node = rootnode
        for branch in match:
            category, name = branch
            node = Node(node, category, name)
            print [n.ID() for n in node.fullPath()]
    

    This will make the matches list more readable, and gives the expected output.