I have some problem for using Biopython, count and sum each base's numbers for parsing FASTA file. In FASTA file, total A is how much? and total T is?
but there's some problem.
1.
handle2="/home/koreanraichu/sra_data_mo.fasta"
for record2 in SeqIO.parse(handle2,"fasta"):
print(Seq(record2.seq).count("A"))
print(type(Seq(record2.seq).count("A")))
This is code, was it successfully read sequence and count adenine, but It never summarize each numbers. I tried it for list append and sum(), simply add but there's no effective. (each count type is int, but never added and printed separately)
for record2 in SeqIO.parse(handle2,"fasta"):
if len(record2.seq) > 100:
i=0
i=i+len(record2.seq)
else:
j=0
j=j+len(record2.seq)
print(i,j)
like upper, this code doesn't work. I meant this code for It is a conditional sum code that adds DNA of 100 bp or more and DNA of less than 100 bp separately. but it never works, too. it prints last record's data.
What can I do things for solve this?
try this code for first problem:
from Bio import SeqIO
# from Bio.Seq import Seq
handle2="Fasta.fa"
for record2 in SeqIO.parse(handle2,"fasta"):
# print(record2.seq, type(record2.seq))
# print(str(record2.seq), type(str(record2.seq)))
print(record2.seq.count("A"))
# print(type(record2.seq).count("A")) ### --> TypeError: count() missing 1 required positional argument: 'sub'
summarize = 0
for i in 'ATGC':
x = record2.seq.count(i)
print(i, ' : ', x)
summarize += record2.seq.count(i)
print(summarize)
given my test fasta :
>Rosalind_4402
GCAGCTAGCTAGCTAGCTGGGATTCGGATCGGCGCCCCGAGAGGATTCTTTCAGCTGTAA
GAATTTATCCTCGATCGGGCTATAAAACCTACGCATATCTGCTAGCTGAGGGGCTATCTT
output:
27
A : 27
T : 32
G : 32
C : 29
120
second code :
from Bio import SeqIO
# from Bio.Seq import Seq
# handle2="/home/koreanraichu/sra_data_mo.fasta"
handle2="Fasta2.fa"
i=0
j=0
for record2 in SeqIO.parse(handle2,"fasta"):
if len(record2.seq) > 100:
print('>100 : ', len(record2.seq))
i=i+len(record2.seq)
else:
print('else : ', len(record2.seq))
j=j+len(record2.seq)
print('> 100 summarize : ', i, ' else summarize : ',j)
given test fasta:
>Rosalind_4402
GCAGCTAGCTAGCTAGCTGGGATTCGGATCGGCGCCCCGAGAGGATTCTTTCAGCTGTAA
GAATTTATCCTCGATCGGGCTATAAAACCTACGCATATCTGCTAGCTGAGGGGCTATCTT
>Rosalind_4403
GCAGCTAGCTAGCTAGCTGGGATTCGGATCGGCGCCCCGAGAGGATTCTTTCAGCTGTAA
GAATTTATCCTCGATCGGGCTATAAAACCTACGCATATCTGCTAGCTGAGGGGCTATCTT
GCAGCTAGCTAGCTAGCTGGGATTCGGATCGGCGCCCCGAGAGGATTCTTTCAGCTGTAA
GAATTTATCCTCGATCGGGCTATAAAACCTACGCATATCTGCTAGCTGAGGGGCTATCTT
>Rosalind_4404
GCAGCTAGCTAGCTAGCTGGGATTCGGATCGGCGCCCCGAGAGGATTCTTTCAGCTGTAA
>Rosalind_4405
GCAGCTAGCTAGCTAGCTGGGATTCGGATCGGCGCCCCGAGAGGATT
>Rosalind_4406
GCAGCTAGCTAGCTAGCTGGGATTCGGATCGGCGCCCCGAGAGGATTCTTTCAGCTGTAA
GAATTTATCCTCGATCGGGCTATAAAACCTACGCATATCTGCTAGCTGAGGGGCTATCTT
CTTTCAGCTGTAAGAATTTATCCTCGATCGGGCTATAAAACCTACGCATATCTGCTAGCT
GAGGGGCTATCTT
>Rosalind_4407
GCAGCTAGCTAGCTAGCTGGGATT
>Rosalind_4408
GCAGCTAGCTAGCTAGCTGGGATTCGGATCGGCGCCCCGAGAGGATTCTTTCAGCTGTAA
GAATTTATCCTCGATCGGGCTATAAAACCTACGCATATCTGCTAGCTGAGGGGCTATCTT
CTTTCAGCTGTAAGAATTTATCCTCGATCGGGCTATAAAACCTACGCATATCTGCTAGC
output:
>100 : 120
>100 : 240
else : 60
else : 47
>100 : 193
else : 24
>100 : 179
> 100 summarize : 732 else summarize : 131