Search code examples
pyparsing

pyparsing not parsing the whole string


I have the following grammar and test case:

from pyparsing import Word, nums, Forward, Suppress, OneOrMore, Group

#A grammar for a simple class of regular expressions
number = Word(nums)('number')
lparen = Suppress('(')
rparen = Suppress(')')

expression = Forward()('expression')

concatenation = Group(expression + expression)
concatenation.setResultsName('concatenation')

disjunction = Group(lparen + OneOrMore(expression + Suppress('|')) + expression + rparen)
disjunction.setResultsName('disjunction')

kleene = Group(lparen + expression + rparen + '*')
kleene.setResultsName('kleene')

expression << (number | disjunction | kleene | concatenation)

#Test a simple input
tests = """
(8)*((3|2)|2)
""".splitlines()[1:]

for t in tests:
    print t
    print expression.parseString(t)
    print

The result should be

[['8', '*'],[['3', '2'], '2']]

but instead, I only get

[['8', '*']]

How do I get pyparsing to parse the whole string?


Solution

  • Your concatenation expression is not doing what you want, and comes close to being left-recursive (fortunately it is the last term in your expression). Your grammar works if you instead do:

    expression << OneOrMore(number | disjunction | kleene)
    

    With this change, I get this result:

    [['8', '*'], [['3', '2'], '2']]
    

    EDIT: You an also avoid the precedence of << over | if you use the <<= operator instead:

    expression <<= OneOrMore(number | disjunction | kleene)