Search code examples
pythonpyparsing

How can I do a non-greedy (backtracking) match with OneOrMore etc. in pyparsing?


I am trying to parse a partially standardized street address into it's components using pyparsing. I want to non-greedy match a street name that may be N tokens long.

For example:

444 PARK GARDEN LN

Should be parsed into:

number: 444
street: PARK GARDEN
suffix: LN

How would I do this with PyParsing? Here's my initial code:

from pyparsing import *

def main():
    street_number = Word(nums).setResultsName('street_number')
    street_suffix = oneOf("ST RD DR LN AVE WAY").setResultsName('street_suffix')
    street_name = OneOrMore(Word(alphas)).setResultsName('street_name')

    address = street_number + street_name + street_suffix
    result = address.parseString("444 PARK GARDEN LN")
    print result.dump()

if __name__ == '__main__':
    main()

but when I try parsing it, the street suffix gets gobbled up by the default greedy parsing behavior.


Solution

  • Use the negation, ~, to check to see if the upcoming street_name is actually a street_suffix.

    from pyparsing import *
    
    street_number = Word(nums)('street_number')
    street_suffix = oneOf("ST RD DR LN AVE WAY")('street_suffix')
    street_name = OneOrMore(~street_suffix + Word(alphas))('street_name')
    
    address = street_number + street_name + street_suffix
    result = address.parseString("444 PARK GARDEN LN")
    print result.dump()
    

    In addition, you don't have to use setResultsName, you can simply use the syntax above. IMHO it leads to a much cleaner grammar definition.