Search code examples
pythonpyparsing

pyparsing: iterate file and stop on match


I am opening a file to parse the content. I know the content is at the beginning of a reasonably large file.

Currently, I open/read the entire file and let pyparse do it's thing. This works but it takes longer than it needs to because of the time to read in the file. So, as a workaround, I limited the file read() to 10kB which makes it faster:

with open(p, errors="ignore",newline='') as f:
    try:
        x = final.search_string(f.read(10*1024), 1) # Only first match
    except ParseException as pe:
        print(pe)

However, I cannot be sure that the content I am looking for is in the first 10kB. So, is there a way to get pyparse to read through the file line-by-line and then stop on match?

Note: The content I am trying to match spans multiple lines so I won't get a full grammar match on just one line. e.g.:

info
{
    a: foo
    b: bar
}

Solution

  • I do not know of any way to make pyparsing stop when it hits a match. Failing that, I would simply chunk the input:

    CHUNKLINES = 20
    BACKUP = 4 # length of expected region - 1
    
    
    with open(p, errors="ignore", newline='') as f:
        match = False
        lines = []
        while not match:
            lines += [f.readline() for _ in range(CHUNKLINES)]
            try:
                match = final.search_string("".join(lines), 1)
            except ParseException as pe:
                print(pe)
            lines = lines[-BACKUP:]
    
    
    if match:
        ...
    

    I've chosen to work with lines rather than bytes as it makes the logic easier. The basic idea is that the maximum error we might have is catching everything except the last line of the desired region, so we keep that many lines around for the next attempt.