Search code examples
pythonbioinformaticsbiopythonfastaclustal

BioPython, how to convert from .fasta to .aln for clustal alignment?


I've a .fasta file that I wish to convert to .aln so that it can be aligned with the alignIO.read command or somehow give my fasta file "Clustal Headers" because when I use the fasta file it just outputs that it's not a known clustal header, is the "ClustalwCommandline" return supposed to do that, because in the tutorial it says to assign its return to cline, and just print cline, not sure what to do with cline

EDIT:- also I'm supposed to output a .dnd file, not sure how either


Solution

  • You don't need to manually convert anything, for instance if you follow the code below:

    >>> from Bio.Align.Applications import ClustalwCommandline
    

    After importing the ClustalwCommandline, you can specify what would be the name of your alignment file, cline is a command that is being constructed in the line below:

    >>> cline = ClustalwCommandline("clustalw", infile="opuntia1.fasta", outfile="opuntia1.aln")
    >>> print cline
    clustalw -infile=opuntia1.fasta -outfile=opuntia1.aln
    

    Now, when you are writing the following line, cline() runs the command that was constructed above and returns the output and error messages to stdout and stderr variables respectivily. If you print stdout and stderr, you will find that stdout is printing the alignment related stuffs and as there was no error for the above command, stderr shows nothing if you print that. Meanwhile, in the output file called opuntia1.aln file contains the alignment now. Go and open that aln file; you should see the alignment.

    >>> stdout, stderr = cline()
    >>>
    >>> print stdout
    
     CLUSTAL 2.1 Multiple Sequence Alignments
    
    
    Sequence format is Pearson
    Sequence 1: CDS         1574 bp
    Sequence 2: EST          723 bp
    Start of Pairwise alignments
    Aligning...
    
    Sequences (1:2) Aligned. Score:  9
    Guide tree file created:   [opuntia1.dnd]
    
    There are 1 groups
    Start of Multiple Alignment
    
    Aligning...
    Group 1:                     Delayed
    Alignment Score 490
    
    CLUSTAL-Alignment file created  [opuntia1.aln]
    
    
    >>> print stderr
    

    For .dnd file, you don't need to specify the outfile, the default file after you run the code would create a dnd file from the fasta file. Here is a direct quote:

    By default ClustalW will generate an alignment and guide tree file with names based on the input FASTA file, in this case opuntia.aln and opuntia.dnd, but you can override this or make it explicit

    Source: http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec89

    Hope that helps, Cheers!