Search code examples
pythonfileiterationfile-manipulation

How to read lines without iterating


I have a text file, and I have a condition set up where I need to extract a chunk of text every other line, but the chunk of text can be any amount of lines (a FASTA file, for any bioinformatics people). It's basically set up like this:

> header, info, info
TEXT-------------------------------------------------------
----------------------------------------------------
>header, info...
TEXT-----------------------------------------------------

... and so forth.

I am trying to extract the "TEXT" part. Here's the code I have set up:

for line in ffile:
    if line.startswith('>'):

      # do stuff to header line

        try:
            sequence = ""
            seqcheck = ffile.next() # line after the header will always be the beginning of TEXT
            while not seqcheck.startswith('>'):
                        sequence += seqcheck
                        seqcheck = ffile.next()

        except:       # iteration error check
            break

This doesn't work, because every time I call next(), it continues the for loop, which results in me skipping a lot of lines and losing a lot of data. How can I just "peek" into the next line, without moving the iterator forward?


Solution

  • I guess if you would check that data doesn't starts with '>' would be a lot easier.

    >>> content = '''> header, info, info
    ... TEXT-------------------------------------------------------
    ... ----------------------------------------------------
    ... >header, info...
    ... TEXT-----------------------------------------------------'''
    >>> 
    >>> f = StringIO(content)
    >>> 
    >>> my_data = []
    >>> for line in f:
    ...   if not line.startswith('>'):
    ...     my_data.append(line)
    ... 
    >>> ''.join(my_data)
    'TEXT-------------------------------------------------------\n----------------------------------------------------\nTEXT-----------------------------------------------------'
    >>> 
    

    Update:

    @tobias_k this should separate lines:

    >>> def get_content(f):
    ...   my_data = []
    ...   for line in f:
    ...     if line.startswith('>'):
    ...       yield my_data
    ...       my_data = []
    ...     else:
    ...       my_data.append(line)
    ...   yield my_data  # the last on
    ... 
    >>> 
    >>> f.seek(0)
    >>> for i in get_content(f):
    ...   print i
    ... 
    []
    ['TEXT-------------------------------------------------------\n', '----------------------------------------------------\n']
    ['TEXT-----------------------------------------------------']
    >>>