I have a text file:
>name_1
data_1
>name_2
data_2
>name_3
data_3
>name_4
data_4
>name_5
data_5
I want to store header (name_1
, name_2
....) in one list and data (data_1
, data_2
....) in another list in a Python program.
def parse_fasta_file(fasta):
desc=[]
seq=[]
seq_strings = fasta.strip().split('>')
for s in seq_strings:
if len(s):
sects = s.split()
k = sects[0]
v = ''.join(sects[1:])
desc.append(k)
seq.append(v)
for l in sys.stdin:
data = open('D:\python\input.txt').read().strip()
parse_fasta_file(data)
print seq
this is my code which i have tried but i am not able to get the answer.
The most fundamental error is trying to access a variable outside of its scope.
def function (stuff):
seq = whatever
function('data')
print seq ############ error
You cannot access seq
outside of function
. The usual way to do this is to have function
return a value, and capture it in a variable within the caller.
def function (stuff):
seq = whatever
return seq
s = function('data')
print s
(I have deliberately used different variable names inside the function and outside. Inside function
you cannot access s
or data
, and outside, you cannot access stuff
or seq
. Incidentally, it would be quite okay, but confusing to a beginner, to use a different variable with the same name seq
in the mainline code.)
With that out of the way, we can attempt to write a function which returns a list of sequences and a list of descriptions for them.
def parse_fasta (lines):
descs = []
seqs = []
data = ''
for line in lines:
if line.startswith('>'):
if data: # have collected a sequence, push to seqs
seqs.append(data)
data = ''
descs.append(line[1:]) # Trim '>' from beginning
else:
data += line.rstrip('\r\n')
# there will be yet one more to push when we run out
seqs.append(data)
return descs, seqs
This isn't particularly elegant, but should get you started. A better design would be to return a list of (description, data) tuples where the description and its data are closely coupled together.
descriptions, sequences = parse_fasta(open('file', 'r').read().split('\n'))
The sys.stdin
loop in your code does not appear to do anything useful.