Hello everyone I need some help :
I do not know if you are familiar with phylogenetic tree but here is an exemple:
/-YP_001604167.1
|
|--YP_001604351.1
--|
| /-seq_TAG2_Canis_taurus
| /-|
| | \-seq_TAG2_Canis_austracus
\-|
| /-YP_001798528.1
\-|
| /-YP_009173671.1
\-|
| /-seq_TAG1_Mus_musculus
\-|
| /-seq_TAG1_Mus_griseus
\-|
| /-seq_TAG2_Canis_canis
\-|
| /-seq_TAG2_Canis_familiaris
\-|
\-seq_TAG2_Canis_lupus
And this tree is coded by a specific format called newick :
'(YP_001604167.1,YP_001604351.1,((seq_TAG2_Canis_austracus,seq_TAG2_Canis_taurus),(YP_001798528.1,(YP_009173671.1,(seq_TAG1_Mus_musculus,(seq_TAG1_Mus_griseus,(seq_TAG2_Canis_lupus,(seq_TAG2_Canis_familiaris,seq_TAG2_Canis_canis))))))));'
The tree ends with a semicolon. The bottommost node in this tree is an interior node, not a tip. Interior nodes are represented by a pair of matched parentheses. Between them are representations of the nodes (seq_names
) that are immediately descended
from that node
, separated by commas
.
son if I have something like :
(A,(B,C));
Then it means that B
and C
are more closely related each other and A
is the most distant.
And the idea of my question was to find a way using for instance python to count the number of groups with the same "TAG_number
" that are more close to each other than any other TAG_number
or YP_number
nodes.
For instance, the TAG2
in representated in 2 groups
where (seq_TAG2_Canis_taurus, seq_TAG2_Canis_austracus)
are together and the second group (seq_TAG2_Canis_canis, (seq_TAG2_Canis_familiaris , seq_TAG2_Canis_lupus))
are together. For the TAG1
as you can see, none of them is nested together because seq_TAG1_Mus_griseus
is more close to the group (seq_TAG2_Canis_canis, (seq_TAG2_Canis_familiaris , seq_TAG2_Canis_lupus))
than it is with the other TAG1 seq_TAG1_Mus_musculus
.
So the result should be something like :
groups for TAG_1 : 0
groups for TAG_2 : 2
I know that some packages in Python or R are available in order to tell if TAG_number are in "monophyletic groups
" but there is nothing to tells the number of groups within the tree if TAG_number
groups are splitted within the tree.
If you have any idea in order to do that? Thank you very much.
Other part of the question :
Now I have a Species phylogeny
such as :
| /-Canis_taurus
| /-|
| | \-Canis_astracus
| /-|
| | | /-Canis_africus
| | \-|
| | | /-Canis_familiaris
\-| \-|
| \-Canis _lupus
|
| /-Canis_canis
\-|
\-Lupus_lupus
and The idea is within each monophyletic groups
assesed in the previous process, to count within clades formed by the MRCA of the clades in the species phylogeny the number of nodes.
So I have 2 groups
:
The first:
# /-TAG2, seq_TAG2_Canis_austracus
# --|
# \-TAG2, seq_TAG2_Canis_taurus
#
Here Canis_austracus
and Canis_taurus
share a MRCA
in the species phylogeny and this ancestor forms the clade composed by 2 species
(Canis_austracus and Canis_taurus
)
So Nb species within species phylogenetic tree = 2
# /-TAG2, seq_TAG2_Canis_lupus
# --|
# | /-TAG2, seq_TAG2_Canis_familiaris
# \-|
# \-TAG2, seq_TAG2_Canis_canis
Here the 3 taxa share a MRCA
and this ancestor forms the clade composed by all species in the species phylogeny (7)
So Nb species within species phylogenetic tree = 7
Maybe get_monophyletic of ete3 is what you need? http://etetoolkit.org/docs/latest/reference/reference_tree.html?highlight=get_monophyletic#ete3.TreeNode.get_monophyletic
from ete3 import Tree import re
# build tree
t = Tree("(YP_001604167.1,YP_001604351.1,"
"((seq_TAG2_Canis_austracus,seq_TAG2_Canis_taurus),"
"(YP_001798528.1,(YP_009173671.1,(seq_TAG1_Mus_musculus,"
"(seq_TAG1_Mus_griseus,(seq_TAG2_Canis_lupus,"
"(seq_TAG2_Canis_familiaris,seq_TAG2_Canis_canis))))))));")
# set tag as leave attribute
for leaf in t:
# get tag from name
tag = re.search('TAG[0-9]', leaf.name)
tag = tag.group(0) if tag else None
leaf.add_features(tag=tag)
# show the hole tree
print(t.get_ascii(attributes=["name", "tag"], show_internal=False))
# show all monophyletic groups for tag=TAG2
for node in t.get_monophyletic(values=["TAG2"], target_attr="tag"):
print(node.get_ascii(attributes=["tag", "name"], show_internal=False))
# /-TAG2, seq_TAG2_Canis_austracus
# --|
# \-TAG2, seq_TAG2_Canis_taurus
#
# /-TAG2, seq_TAG2_Canis_lupus
# --|
# | /-TAG2, seq_TAG2_Canis_familiaris
# \-|
# \-TAG2, seq_TAG2_Canis_canis