I have one .fa file with letters sequence like ACGGGGTTTTGGGCCCGGGGG and .txt file with numbers that show start and stop position like start 2 stop 7. How could I extract letters only from the specific positions from my .fa file and create new file that will contain only letters from the assigned positions? I wrote such code but I got the error "string index out of range'' my position txtx file is just a lit with positions like [[1,52],[66,88].....
my_file = open('dna.fa')
transcript = my_file.read()
positions = open('exons.txt')
positions = positions.read()
coding_sequence = '' # declare the variable
for i in xrange(len(positions)):
start = positions[i][0]
stop = positions[i][1]
exon = transcript[start:stop]
coding_sequence = coding_sequence + exon
print coding_sequence `
Assuming that your positions are stored in a list called positions
, that the name of your infile is infile.fa
, and the name of your outfile is outfile.fa
:
with open("infile.fa") as infile:
text = infile.read()
letters = "".join(text[i] for i in positions)
with open("outfile.fa", "w") as outfile:
outfile.write(letters)
As has been mentioned in @KIDJourney's comment, this could theoretically fail for files large enough that there is not enough memory to store it. Here is how you could do it if that is the case:
with open("infile.fa") as infile:
with open("outfile.fa", "a") as outfile:
outfile.seek(0)
i = 0
for line in infile:
for char in line:
if i in positions:
outfile.write(char)
i += 1