Sample file:
Column 10: A|Y|E|A
Column 11: W|I|Q|Q
How do I calculate amino acid composition (percentage) specific to each column? for ex: composition of A in column 10 is 50%, E is 25% and Y is 25%.
Biopython provides modules to calculate amino acid composition of entire file in fasta format
from Bio import SeqIO
from Bio.SeqUtils.ProtParam import ProteinAnalysis
for record in SeqIO.parse('output_translation3.fasta', 'fasta'):
X = ProteinAnalysis(str(record.seq))
print('\n Results for record: {}'.format(record.id))
print(X.count_amino_acids()['G'])
print(X.count_amino_acids()['A'])
print(X.count_amino_acids()['L'])
print(X.count_amino_acids()['M'])
from collections import Counter
import re
with open("input.txt") as f:
for line in f:
line=line.strip()
[col,sep,seq] = re.split(r'(: )', line)
aa = re.split(r'[|]', seq)
aa_counts = Counter(aa)
aa_length=len(aa)
print(col)
for k,v in aa_counts.items():
print(" ", k, v/aa_length)
Gives:
Column 10
A 0.5
Y 0.25
E 0.25
Column 11
W 0.25
I 0.25
Q 0.5