Search code examples
pythonloopsiterationreadline

Need to Read a Line Ahead Without Reading Two Lines at a Time (Python)


I'm working on a python code that reads a text file line-by-line and prints the line and the next line if the line starts with ">" and the next line starts with "G". To illustrate, I want the following input file...

>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
>mm10_sample_name_here
AATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
>mm10_sample_name_here
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG

To output as...

>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG

I've tried using next() in the following...

original_file = 'test_input_file.txt'
file_destination = 'test_output_file.txt'

import os
if os.path.exists(file_destination):
  os.remove(file_destination)

f=open(original_file, 'r+')

for line in f:
  try:
    line2 = next(f)
  except StopIteration:
    line2 = ""
  if line2.startswith("G") and line.startswith(">"):
    with open(file_destination, "a") as myfile:
       myfile.write(line)
       myfile.write(line2)

However, it reads the input file two lines at a time meaning that once a line fails the if condition, all further lines are mismatched. Any help on this would be great. Thanks.


Solution

  • As you have found out, your solution doesn't work. You are advancing the generator by two items on every iteration (since you call next()). You need to use a strategy to only advance the generator once. One would be to keep state while looping, e.g.

    previous_line = ""
    for line in f:
      if line.startswith("G") and previous_line.startswith(">"):
        ...
      previous_line = line
    

    You could also keep the next() function and use e.g. while True:, but beware of the edge case when there are multiple lines starting with ">".