Here is a piece of code which is producing an exception:
from pyparsing import *
strg = 'ab3jh-lokdk-12345-lopf9$ab3jh-lokdk-12345-lopfr'
# strg = 'ab3jh-lokdk-12345-lopf9$ab3jh-lokdk-12345-lopf9'
D = Suppress('-')
TYPE_1 = Regex('\w{2}\d\w{2}').setName('tp1')
TYPE_2 = Regex('\w{5}').setName('tp2')
TYPE_3 = Regex('\d{5}').setName('tp3')
TYPE_4 = Regex('\w{4}\d').setName('tp4')
ELM = Group(TYPE_1 + D + TYPE_2 + D + TYPE_3 + D + TYPE_4).setName('elm')
ALL = ELM + (Literal('$')+ELM)[...]
try:
result = ALL.parseString(strg,parseAll=True)
except ParseException as parse_err:
print(parse_err.explain())
The printed exception explanation is the following:
ab3jh-lokdk-12345-lopf9$ab3jh-lokdk-12345-lopfr
^
ParseException: Expected end of text, found '$' (at char 23), (line:1, col:24)
pyparsing.core.StringEnd - StringEnd
As one can see the error is shown to be on the dollar sign when it is actually on the last character. Is there a way to have pyparsing
indicating where the error actually is (meaning on the last character of the second group)?
Yes, pyparsing has a way of bubbling up exceptions a little too far some times. You can override this by using "-" operator instead of "+" as a way of saying "if there is a parse mismatch after here, don't back up and try something else". It raises a ParseSyntaxException
instead of ParseException
, so to catch them, you need to do except ParseBaseException
.
In your parser, try replacing ELM
with:
ELM = Group(TYPE_1 + D - TYPE_2 + D + TYPE_3 + D + TYPE_4).setName('elm')
It's basically saying "once you see TYPE_1
and "-", there better be a valid ELM
afterward, and if not, it's a syntax error." Don't go overboard though and just replace every "+" with "-" - just do it in those places where you want the parser to commit to parsing the rest of that expression or just give up on an exception.