Search code examples
pythonoptimizationwhile-loopfile-searchlinecache

Optimizing python file search?


I'm having some trouble optimizing this part of code. It works, but seems unnecessary slow. The function searches after a searchString in a file starting on line line_nr and returns the line number for first hit.

import linecache
def searchStr(fileName, searchString, line_nr = 1, linesInFile):
# The above string is the input to this function 
# line_nr is needed to search after certain lines.
# linesInFile is total number of lines in the file.

    while line_nr < linesInFile + 1:
        line = linecache.getline(fileName, line_nr)
        has_match = line.find(searchString)
        if has_match >= 0:
            return line_nr
            break
        line_nr += 1

I've tried something along these lines, but never managed to implement the "start on a certain line number"-input.

Edit: The usecase. I'm post processing analysis files containing text and numbers that are split into different sections with headers. The headers on line_nr are used to break out chunks of the data for further processing.

Example of call:

startOnLine = searchStr(fileName, 'Header 1', 1, 10000000): endOnLine = searchStr(fileName, 'Header 2', startOnLine, 10000000):


Solution

  • Why don't you start with simplest possible implementation ?

    def search_file(filename, target, start_at = 0):
        with open(filename) as infile:
            for line_no, line in enumerate(infile):
                if line_no < start_at:
                    continue
                if line.find(target) >= 0:
                    return line_no
        return None