Search code examples
pythonpython-2.7biopythonsequence-alignment

How to filter alignment columns based on list of position in biopython?


Based on the biopython help page here, I can filter the alignment columns based on first or last 10, I can even piece together subalignment using

align[:, :10] + align[:, -10:]

align being an MSA object, generated using

from Bio import AlignIO
align = AlignIO.read("Clustalw/opuntia.aln", "clustal")

But, is it possible to, say extract column based on list of position. For example, if i have a following list:

a=[12, 52, 68,45]

Is there a way to extract just these columns from the alignment align.

An R package called bio3d comes in handy to filter alignment by providing list as input (by doing: filtered_align = align[, a]), but would be great if i can use this from python.

Thank you


Solution

  • According to the Biopython docs, you can get column x with

    align[:, x]
    

    So the following should do the job for you:

    from Bio import AlignIO
    
    align = AlignIO.read("Clustalw/opuntia.aln", "clustal")
    indices = [12, 52, 68, 45]
    columns_as_strings = []
    
    for column in indices:
        columns_as_strings.append(align[:, column])