Search code examples
pythonnetworkxbioconductorphylogenycytoscape

conversion newick to graphml using python


I would like to convert a tree from newick to a format like graphml, that I can open with cytoscape.

So, I have a file "small.newick" that contain:

((raccoon:1,bear:6):0.8,((sea_lion:11.9, seal:12):7,((monkey:100,cat:47):20, weasel:18):2):3,dog:25);

So far, I did that way (Python 3.6.5 |Anaconda):

from Bio import Phylo
import networkx
Tree = Phylo.read("small.newick", 'newick')
G = Phylo.to_networkx(Tree)
networkx.write_graphml(G, 'small.graphml')

image1

There is a problem with the Clade, that I can fix using this code:

from Bio import Phylo
import networkx

def clade_names_fix(tree):
    for idx, clade in enumerate(tree.find_clades()):
        if not clade.name:
            clade.name=str(idx)

Tree = Phylo.read("small.newick", 'newick')
clade_names_fix(Tree)
G = Phylo.to_networkx(Tree)
networkx.write_graphml(G, 'small.graphml')

Giving me something that seem nice enough:

image2

My questions are:

  • Is that a good way to do it? It seem weird to me that the function does not take care of the internal node names

  • If you replace one node name with a string long enough, it will be trimmed by the command Phylo.to_networkx(Tree). How to avoid that?

Example: substitution of "dog" by "test_tring_that_create_some_problem_later_on"

image3


Solution

  • Looks like you got pretty far on this already. I can only suggest a few alternatives/extensions to your approach...

    1. Unfortunately, I couldn't find a Cytoscape app that can read this format. I tried searching for PHYLIP, NEWICK and PHYLO. You might have more luck:

    2. There is an old Cytoscape 2.x plugin that could read this format, but to run this you would need to install Cytoscape 2.8.3, import the network, then export as xGMML (or save as CYS) and then try to open in Cytoscape 3.7 in order to migrate back into the land of living code. Then again, if 2.8.3 does what you need for this particular case, then maybe you don't need to migrate:

    3. The best approach is programmatic, which you already explored. Finding an R or Python package that turns NEWICK into iGraph or GraphML is a solid strategy. Note that there are updated and slick Cytoscape libs in those languages as well, so you can do all label cleanup, layout, data visualization, analysis, export, etc all within the scripting environment: