I've done a gene prediction using SoftBerry and it returns output like this:
Predicted protein(s):
>FGENESH:[mRNA] 1 12 exon (s) 4267 - 6782 1296 bp, chain +
ATGATACGCACTGCGCTTTCACGAGCAGCGGCCATCGTCGCCGCCCGCACCTCCGCCAAG
CTCCGCCCTTCCCTCCTCGCTCGATCTCCGCCGTCCAGACTCCTCCACGATGGGATTAAC
GCCAACCCAGTTGCTCTTCAGATGATCAACTACGCCGTCTCTCTCGCCAGGTCTCAGAAA
>FGENESH: 1 12 exon (s) 4267 - 6782 431 aa, chain +
MIRTALSRAAAIVAARTSAKLRPSLLARSPPSRLLHDGINANPVALQMINYAVSLARSQK
SDESYGQAQLVLEQCLSSQPSEGQDLATHNSRAMVLMAMSTLLSERGKLDEAIEKLQKVE
etc: an extensive output so manual editing is not trivial. I need to fish out the sequences that start with '>FGENESH:[mRNA]'. So, I try this:
for line in infile:
if line.startswith('>FGENESH:[mRNA]'):
print(line)
outfile.write(line)
Which gives me only the header lines:
>FGENESH:[mRNA] 1 12 exon (s) 4267 - 6782 1296 bp, chain +
However, I would like the output to look like this:
>FGENESH:[mRNA] 1 12 exon (s) 4267 - 6782 1296 bp, chain +
ATGATACGCACTGCGCTTTCACGAGCAGCGGCCATCGTCGCCGCCCGCACCTCCGCCAAG
CTCCGCCCTTCCCTCCTCGCTCGATCTCCGCCGTCCAGACTCCTCCACGATGGGATTAAC
GCCAACCCAGTTGCTCTTCAGATGATCAACTACGCCGTCTCTCTCGCCAGGTCTCAGAAA
Could anyone advice me how to obtain that? I would be much obliged - being a novice and all. Thank you.
jd
flag = False
for line in infile:
if flag is True:
if line.startswith('>'):
flag = False
else:
outfile.write(line)
if line.startswith('>FGENESH:[mRNA]'):
flag = True
outfile.write(line)