I have been stuck on a problem for three days... searched everywhere, posted on Biostar, still waiting for EMBL to respond to emails... would make a bounty if I had more rep.
After aligning sequences with EMBOSSwin needle()
(pairwise global alignments) I get alignment files in pair
format, with a .needle
file extension. I want to use Biopython to read these alignments for later analysis.
I use AlignIO.read(open('alignment.needle'),'emboss')
following the instructions in Biopython's AlignIO wiki but I keep getting an AssertionError
.
My code:
>>> from Bio import AlignIO
>>> alignment = AlignIO.read(open("data/all/out/pair1_alignment.needle"), "emboss")
My error:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Python27\lib\Bio\AlignIO\__init__.py", line 423, in read
first = next(iterator)
File "C:\Python27\lib\Bio\AlignIO\__init__.py", line 370, in parse
for a in i:
File "C:\Python27\lib\Bio\AlignIO\EmbossIO.py", line 150, in __next__
assert seq.replace("-", "") != ""
AssertionError
Example Alignment File:
Download the alignment file here
Versions:
Clues:
I suspect this may be related to a warning message I kept getting when actually making the alignments, which was outputted by EMBOSS needle()
function:
Warning: Sequence character string not found in ajSeqCvtKS
Duplicate post on BioStars, http://www.biostars.org/p/87226/#87399
This appears to be down to a subtle change in the EMBOSS output. You have an extremely old version, EMBOSS version 2.10.0 (February 2005), and your output file has lines like this:
gag 1288 -------------------------------------------------- 1287
Using a newer version of EMBOSS (e.g. 6.3.0), gives lines like this:
gag 1287 -------------------------------------------------- 1287
The Biopython parser is expecting the latter for alignment sections with no letters (e.g. when one sequence is much longer than the other), where the start and end coordinates agree. Please update your copy of EMBOSS, and then the parser should be happy. The current EMBOSS release is version 6.5.0.