Search code examples
pythonbioinformaticsbiopython

How to get a consensus of multiple sequence alignments using Biopython?


I am trying to get a consensus sequence from my multiple alignments files (fasta format).

I have a few fasta files each containing multiple sequence alignments. When I try to run this function below I get an AttributeError: 'generator' object has no attribute 'get_alignment_length'.

I haven't been able to find any code examples for this using AlignIO.parse, I only saw examples using AlignIO.read.

def get_consensus_seq(filename):

    alignments = (AlignIO.parse(filename,"fasta"))
    summary_align = AlignInfo.SummaryInfo(alignments)
    consensus_seq = summary_align.dumb_consensus(0.7,"N")
    print(consensus_seq)

Solution

  • If I understand your situation right, the problem is the impossibility to get SummaryInfo from several alignments. They should be united into one.

    from __future__ import annotations
    from pathlib import Path
    from itertools import chain
    
    import Bio
    from Bio import AlignIO
    from Bio.Align import MultipleSeqAlignment
    from Bio.Align.AlignInfo import SummaryInfo
    
    
    SeqRecord = Bio.SeqRecord.SeqRecord
    
    
    def get_consensus_seq(filename: Path | str) -> SeqRecord:
        common_alignment = MultipleSeqAlignment(
            chain(*AlignIO.parse(filename, "fasta"))
        )
        summary = SummaryInfo(common_alignment)
        consensus = summary.dumb_consensus(0.7, "N")
        return consensus