Search code examples
pythonpython-3.xpyparsing

pyparsing exception explaination: need more precise information on error location


Here is a piece of code which is producing an exception:

from pyparsing import *

strg = 'ab3jh-lokdk-12345-lopf9$ab3jh-lokdk-12345-lopfr'
# strg = 'ab3jh-lokdk-12345-lopf9$ab3jh-lokdk-12345-lopf9'

D = Suppress('-')
TYPE_1 = Regex('\w{2}\d\w{2}').setName('tp1')
TYPE_2 = Regex('\w{5}').setName('tp2')
TYPE_3 = Regex('\d{5}').setName('tp3')
TYPE_4 = Regex('\w{4}\d').setName('tp4')

ELM = Group(TYPE_1 + D + TYPE_2 + D + TYPE_3 + D + TYPE_4).setName('elm')

ALL = ELM + (Literal('$')+ELM)[...]

try:
    result = ALL.parseString(strg,parseAll=True)
except ParseException as parse_err:
    print(parse_err.explain())

The printed exception explanation is the following:

ab3jh-lokdk-12345-lopf9$ab3jh-lokdk-12345-lopfr
                       ^
ParseException: Expected end of text, found '$'  (at char 23), (line:1, col:24)
pyparsing.core.StringEnd - StringEnd

As one can see the error is shown to be on the dollar sign when it is actually on the last character. Is there a way to have pyparsing indicating where the error actually is (meaning on the last character of the second group)?


Solution

  • Yes, pyparsing has a way of bubbling up exceptions a little too far some times. You can override this by using "-" operator instead of "+" as a way of saying "if there is a parse mismatch after here, don't back up and try something else". It raises a ParseSyntaxException instead of ParseException, so to catch them, you need to do except ParseBaseException.

    In your parser, try replacing ELM with:

    ELM = Group(TYPE_1 + D - TYPE_2 + D + TYPE_3 + D + TYPE_4).setName('elm')
    

    It's basically saying "once you see TYPE_1 and "-", there better be a valid ELM afterward, and if not, it's a syntax error." Don't go overboard though and just replace every "+" with "-" - just do it in those places where you want the parser to commit to parsing the rest of that expression or just give up on an exception.