expr = Ref('expr')
block = '{' + Repeat(expr) + '}'
expr = block | Token(re='[0-9]')
START = expr
Here's a grammar using the lrparsing
module for Python. The module reports no conflicts in the grammar.
It fails to parse the string {{0}}
with the error
lrparsing.ParseError: line 1 column 5: Got '}' when expecting __end_of_input__ while trying to match block in state 11
The stack state step by step is:
shift '{'; ItemSet:5='{'
shift '{'; ItemSet:5='{' ItemSet:5='{'
shift /[0-9]/; ItemSet:4=/[0-9]/ ItemSet:5='{' ItemSet:5='{'
reduce '}'; ItemSet:4=/[0-9]/ -- ItemSet:7=expr ItemSet:5='{' ItemSet:5='{'
reduce '}'; ItemSet:7=expr -- ItemSet:9=expr ItemSet:5='{' ItemSet:5='{'
shift '}'; ItemSet:11='}' ItemSet:9=expr ItemSet:5='{' ItemSet:5='{'
Which afaik means it's shifting {{0
, then upon seeing }
reducing 0
to expr
, then reducing on }
again without having shifted it first, which confuses the bajeezus out of me.
Is this a bug, or am I doing something infinitely simple and stupid?
If it's my grammar, how would I refactor it to satisfy my eternally steamy and passionate desires? If it's a bug, could someone direct me to a python module that has syntax most similar to lrparsing that does work?
EDIT: Refactoring as:
blocks = Ref('blocks')
block = Ref('block')
expr = Ref('expr')
blocks = blocks + block | THIS*0 # THIS*0 is the idiomatic way of getting the empty string
block = '{' + expr + '}'
expr = blocks | Token(re='[0-9]')
START = expr
Allows for proper parsing. The question for me now, is... why? I feel like lrparsing
would've complained to me earlier about any parsing problems... Does Repeat
simply not work the way I expect it to?
lrparsing has a bug; it does not consider the recursive repeats correctly.
Your actual problem can be solved by simple recursion, as you did in your extended edit, though with a bit less clutter.
block = Ref('block')
block = '{' + block + '}' | Token(re='[0-9]')
START = block
Note also, that your original grammar would have allowed input such as {{0{1}}}
. (The reason being that the repeatable part can be expanded to either simple numbers or expr
again.) Considering your second grammar, you probably didn't want that anyways.
I did get some work done with pyparsing, but the syntax is considerably different. Analogous example:
from pyparsing import Forward, Literal, nums, oneOf, Word
l = lambda c: Literal(c).suppress()
block = Forward()
block << (Word(nums, exact=1) ^ l('{') + block + l('}'))
print(block.parseString("{{0}}"))
Output:
['0']
Hope that helps.