I have created a function that parses a Fasta file because I needed to remove some odd characters. Now I have a dictionary and want to turn it back to a fasta format. I am new to Fasta files so I don't know how to proceed.
The dictionary has this format:
{'NavAb:/1126': 'TNIVESSFFTKFIIYLIVLNGITMGLETSKTFMQSFGVYTTLFNQIVITIFTIEIILRIYVHRISFFKDPWSLFDFFVVAISLVPTSSGFEILRVLRVLRLFRLVTAVPQMRKI', 'Shaker:/1656': 'SSQAARVVAIISVFVILLSIVIFCLETLEDEVPDITDPFFLIETLCIIWFTFELTVRFLACPLNFCRDVMNVIDIIAIIPYFITTLNLLRVIRLVRVFRIFKLSRHSKGLQIL', .....
The function:
def parse_file(input_file):
parsed_seqs = {}
curr_seq_id = None
curr_seq = []
for line in newfile:
line = line.strip()
line = line.replace('-', '')
if line.startswith(">"):
if curr_seq_id is not None:
parsed_seqs[curr_seq_id] = ''.join(curr_seq)
curr_seq_id = line[1:]
curr_seq = []
continue
curr_seq.append(line)
parsed_seqs[curr_seq_id] = ''.join(curr_seq)
return parsed_seqs
newfile = open("file")
parsed_seqs = parse_file(newfile)
print(parsed_seqs)
If you can use an existing library for this task, you may use Biotite:
import biotite.sequence.io.fasta as fasta
seq_dict = {
'NavAb:/1126': 'TNIVESSFFTKFIIYLIVLNGITMGLETSKTFMQSFGVYTTLFNQIVITIFTIEIILRIYVHRISFFKDPWSLFDFFVVAISLVPTSSGFEILRVLRVLRLFRLVTAVPQMRKI',
'Shaker:/1656': 'SSQAARVVAIISVFVILLSIVIFCLETLEDEVPDITDPFFLIETLCIIWFTFELTVRFLACPLNFCRDVMNVIDIIAIIPYFITTLNLLRVIRLVRVFRIFKLSRHSKGLQIL'
}
fasta_file = fasta.FastaFile()
for header, seq_str in seq_dict.items():
fasta_file[header] = seq_str
fasta_file.write("path/to/file.fasta")
path/to/file.fasta
:
>NavAb:/1126
TNIVESSFFTKFIIYLIVLNGITMGLETSKTFMQSFGVYTTLFNQIVITIFTIEIILRIYVHRISFFKDPWSLFDFFVVA
ISLVPTSSGFEILRVLRVLRLFRLVTAVPQMRKI
>Shaker:/1656
SSQAARVVAIISVFVILLSIVIFCLETLEDEVPDITDPFFLIETLCIIWFTFELTVRFLACPLNFCRDVMNVIDIIAIIP
YFITTLNLLRVIRLVRVFRIFKLSRHSKGLQIL
Note that I belong to the developers of this package. There are also solutions in a multitude of other packages, such as Biopython.