I'm working on a python code that reads a text file line-by-line and prints the line and the next line if the line starts with ">" and the next line starts with "G". To illustrate, I want the following input file...
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
>mm10_sample_name_here
AATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
>mm10_sample_name_here
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
To output as...
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
I've tried using next() in the following...
original_file = 'test_input_file.txt'
file_destination = 'test_output_file.txt'
import os
if os.path.exists(file_destination):
os.remove(file_destination)
f=open(original_file, 'r+')
for line in f:
try:
line2 = next(f)
except StopIteration:
line2 = ""
if line2.startswith("G") and line.startswith(">"):
with open(file_destination, "a") as myfile:
myfile.write(line)
myfile.write(line2)
However, it reads the input file two lines at a time meaning that once a line fails the if condition, all further lines are mismatched. Any help on this would be great. Thanks.
As you have found out, your solution doesn't work. You are advancing the generator by two items on every iteration (since you call next()). You need to use a strategy to only advance the generator once. One would be to keep state while looping, e.g.
previous_line = ""
for line in f:
if line.startswith("G") and previous_line.startswith(">"):
...
previous_line = line
You could also keep the next() function and use e.g. while True:
, but beware of the edge case when there are multiple lines starting with ">".