Search code examples
pythongrepbioinformaticsfastafastq

Python print lines after context


How can I print two lines after the context i am interest in using python.

Example.fastq

@read1
AAAGGCTGTACTTCGTTCCAGTTG
+
'(''%$'))%**)2+'.(&&'/5-
@read2
CTGAGTTGAGTTAGTGTTGACTC
+
)(+-0-2145=588..,(1-,12

I can find the context of interest using...

fastq = open(Example.fastq, "r")

IDs = [read1]

with fastq as fq:
    for line in fq:
        if any(string in line for string in IDs):

Now that I have found read1 I want to print out the the following lines for read1. In bash i might use something like grep -A to do this. The desired output lines look like the following.

+
'(''%$'))%**)2+'.(&&'/5-

But in python i cant seem to find an equivalent tool. Perhaps "islice" might work but I don't see how I can get islice to start at the position of the match.

with fastq as fq:
    for line in fq:
        if any(string in line for string in IDs):
            print(list(islice(fq,3,4)))

Solution

  • You can use next() to advance an iterator (including files):

    print(next(fq))
    print(next(fq))
    

    This consumes those lines, so the for loop will continue with @read2.

    if you don't want the AAA... line, you can also just consume it with next(fq). In full:

    fastq = open(Example.fastq, "r")
    
    IDs = [read1]
    
    with fastq as fq:
        for line in fq:
            if any(string in line for string in IDs):
                next(fq)  # skip AAA line
                print(next(fq).strip())  # strip off the extra newlines
                print(next(fq).strip())
    

    which gives

    +
    '(''%$'))%**)2+'.(&&'/5-