Search code examples
pythonfiletextfile-iodata-analysis

Deleting the text preceeding a given sentence


I'm working on a file containing lot of biological data, my input file looks like,

Start
blah
blah
blah
blah
blah
5'UTR
IMPORTANT STRING
blah
blah
//

Start
blah
blah
blah
5'UTR
IMPORTANT STRING
blah
blah
blah
//

.... and so on this occurs for around 4k times. Now the challenge is to check if the important string contains "NO information", if it does delete the entire paragraph (from Start to //) if not write the entire thing into a new file.

The problem i'm facing is that "5'UTR" is not recognised as a keyword when i do, for keyword in line Also i can't seem to delete the entire paragraph. How do i write a functional code in python


Solution

  • Rather than reading in the whole file and performing a regex on it, I'd read it in chunks, one record at a time, and yield it. Yield is Python's way of efficiently only evaluating a sequence as it's needed.

    def records(stream):
        while stream:
            lines = []
            for line in stream:
                lines.append(line)
                if line.startswith('//'):
                    break
            record = ''.join(lines)
            yield record
    
    for record in records(data):
        if "5'UTR\nNO information" not in record:
            output.write(record)