Search code examples
pythonpython-3.xpyparsing

pyparsing partial match or recursion limit hit


Using pyparsing, I seem to be having trouble understanding why my grammar either matches partially or hits a recursion limit.

lparens = Suppress("(")
rparens = Suppress(")")
name = Word(alphanums, exact=1)
expression = Forward()
function = Group(Literal("λ") + OneOrMore(name) + Literal(".").suppress() + expression)
application = Group(expression + expression) #<-- the problem *I think*
exp1 = ( name | function | application )
exp2 = (lparens + exp1 + rparens) | exp1
expression << exp2

So the following will parse but only picks up the "a" and does not apply the application step:

expression.parseString("ab") #result is: (['a'], {})
expression.parseString("(ab)") #result is: exception - recursion limit reached.

In the first example why did it stop at 'a' and not apply the application step and run into the same infinite look it does in the second example?

In the second example it matches '(' and so it needs the ')' to finish the exp1. So it should parse the name 'a' and since there is no following ')' it should abandon that and try function (which fails) and then move on to application. Then it parses name 'a' (first match) and should move on to name 'b' completing the name, the application, and then match ')' to complete the exp1 and the expression.

Obviously this is not what happens.

edit: forgot to add the following works:

expression.parseString("((λa.a)(λa.a))")

` #result: ([([(['λ', 'a', 'a'], {}), (['λ', 'a', 'a'], {})], {})], {})

adding setDebug() and setName() to the various elements and parsing '(ab)' yields:

Match expression at loc 0(1,1)
Match exp2 at loc 0(1,1)
Match lparens at loc 0(1,1)
Matched lparens -> []
Match exp1 at loc 1(1,2)
Match name at loc 1(1,2)
Matched name -> ['a']
Matched exp1 -> ['a']
Match rparens at loc 2(1,3)
Exception raised:Expected ")" (at char 2), (line:1, col:3)
Match name at loc 0(1,1)
Exception raised:Expected name (at char 0), (line:1, col:1)
Match function at loc 0(1,1)
Exception raised:Expected "λ" (at char 0), (line:1, col:1)
Match application at loc 0(1,1)
Match expression at loc 0(1,1)
Match exp2 at loc 0(1,1)
Match lparens at loc 0(1,1)
Matched lparens -> []
Match exp1 at loc 1(1,2)
Match name at loc 1(1,2)
Matched name -> ['a']
Matched exp1 -> ['a']
Match rparens at loc 2(1,3)
Exception raised:Expected ")" (at char 2), (line:1, col:3)
Match name at loc 0(1,1)
Exception raised:Expected name (at char 0), (line:1, col:1)
Match function at loc 0(1,1)
Exception raised:Expected "λ" (at char 0), (line:1, col:1)
Match application at loc 0(1,1)
Match expression at loc 0(1,1)
Match exp2 at loc 0(1,1)
Match lparens at loc 0(1,1)
Matched lparens -> []
Match exp1 at loc 1(1,2)
Match name at loc 1(1,2)
... etc etc etc...

Solution

  • I struggled with your question, as I did not really have a good grasp of just what you were trying to parse. Fortunately, the lambda character in your function definition was enough of a tip-off that I searched for some background on lambda calculus notation, and found a decent description here: http://palmstroem.blogspot.com/2012/05/lambda-calculus-for-absolute-dummies.html

    From this I came up with a simple BNF:

    expr ::= (name | function | '(' expr ')')+
    function ::= 'λ' name+ '.' expr
    name ::= a single character 'a'..'z' or 'A'..'Z'
    

    Translating to pyparsing from here looks like:

    lparens = Suppress("(")
    rparens = Suppress(")")
    dot = Suppress('.')
    # work around output encoding issues by displaying as '^'
    λ = Literal('λ').setParseAction(replaceWith('^'))
    
    # Forward() placeholder for recursive expression
    expression = Forward()
    
    # implement BNF bottom-up
    name = oneOf(list(alphas))
    function = Group(λ + Group(OneOrMore(name))("head") + dot + expression("body"))
    lambda_term = name | function | lparens + expression + rparens
    expression <<= Group(OneOrMore(lambda_term))
    

    This parses your sample expression as:

    ((λa.a)(λa.a))
    [[[[['^', ['a'], ['a']]], [['^', ['a'], ['a']]]]]]
    

    This seems like a lot of extra nesting to wade through. We can reduce some of it by tweaking expression slightly, to Group only if there are 2 or more expressions, otherwise parse a single ungrouped expression. The tuple multiplication feature lets us define this as:

    expression <<= Group(lambda_term*(2,)) | lambda_term
    

    Giving:

    [[['^', ['a'], 'a'], ['^', ['a'], 'a']]]
    

    Or more clearly:

    [
        [
            ['^', ['a'], 'a'], 
            ['^', ['a'], 'a']
        ]
    ]
    

    In your posted parser, you also had a concept of "application". I speculate that what you call an application is what the cited article-for-dummies refers to as "resolving". To resolve a function, you take subsequent expressions one-for-one with each name in the function head, and replace the name in the body with its respective expression. I tried to comprehend this at parse time, but struggled with expressions where the function was nested within other functions, or the head and body were defined in nested parentheses, with the replacement expressions separated from the function by several nesting levels. I came to the conclusion that resolution has to be done after parsing of names and functions is complete. The structure that pyparsing imparts to the results should make it straightforward to walk the results and successively resolve names with expressions.

    Here is the complete parser code:

    """
    (from http://palmstroem.blogspot.com/2012/05/lambda-calculus-for-absolute-dummies.html)
    
    BNF:
      expr ::= (name | function | '(' expr ')')+
      name ::= ['a'..'z' 'A'..'Z']+
      function ::= lambda name+ '.' expr
    """
    
    lparens = Suppress("(")
    rparens = Suppress(")")
    dot = Suppress('.')
    # work around output encoding issues by displaying as '^'
    λ = Literal('λ').setParseAction(replaceWith('^'))
    
    # Forward() placeholder for recursive expression
    expression = Forward()
    
    # implement BNF bottom-up
    name = oneOf(list(alphas))
    function = Group(λ + Group(OneOrMore(name))("head") + dot + expression("body"))
    lambda_term = name | function | lparens + expression + rparens
    #~ expression <<= Group(OneOrMore(lambda_term))
    expression <<= Group(lambda_term*(2,)) | lambda_term
    
    
    tests = """\
        ((λa.a)(λa.a))
        (λy.xy) ab
        λx.λy.xzy
    """.splitlines()
    for t in tests:
        t = t.strip()
        print(t.replace('λ','^'))
        expression.parseString(t, parseAll=True).pprint()
        print()