Search code examples
javascriptnode.jslambda-calculuspegleft-recursion

Avoiding Left Recursion parsing Lambda Calculus while maintaining Left Associativity


I am trying to parse lambda calculus terms into AST leveraging JavaScript and PEG.JS. The grammar is fairly easy:

/*****************************************************************
        t ::=
                x                                       variable
                λx.t                                    abstraction
                t t                                     application
*****************************************************************/

From which I have coded out the PEG:

TERM "term"
    = ABSTRACTION
    / APPLICATION
    / VARIABLE

APPLICATION "application"
    /*****************************************************************
        application ::= t t
    *****************************************************************/
    = APPLICATION_W_PARENS
    / APPLICATION_WO_PARENS

ABSTRACTION "abstraction"
    /*****************************************************************
        abstraction ::= λx.t
    *****************************************************************/
    = ABSTRACTION_W_PARENS
    / ABSTRACTION_WO_PARENS

VARIABLE "variable"
    /*****************************************************************
        variable ::= x
    *****************************************************************/
    =  x:CHARACTER
    {
        return Variable(location(), x)
    }

//////////////////////////////////////////////////////////////////////
// Application
//////////////////////////////////////////////////////////////////////

ABSTRACTION_OR_VARIABLE
    //////////////////////////////////////////////////////////////////
    // "Left recursive grammar" workaround "term term" enters a loop
    //      assuming the left side cannot match Application
    //      remediates the left recursion issue
    //////////////////////////////////////////////////////////////////
    = ABSTRACTION / VARIABLE

APPLICATION_W_PARENS
    /*****************************************************************
        '(' -> Abstraction | Variable -> Term -> ')'
    *****************************************************************/
    = L_PARENS lhs:ABSTRACTION_OR_VARIABLE rhs:TERM R_PARENS
    {
        return Application(location(), lhs, rhs, true)
    }

APPLICATION_WO_PARENS
    /*****************************************************************
        Abstraction | Variable -> Term
    *****************************************************************/
    = lhs:ABSTRACTION_OR_VARIABLE rhs:TERM
    {
        return Application(location(), lhs, rhs, false)
    }

//////////////////////////////////////////////////////////////////////
// Abstraction
//////////////////////////////////////////////////////////////////////

ABSTRACTION_W_PARENS "abstraction"
    /*****************************************************************
            '(' -> 'λ' -> Variable -> '.' -> TERM -> ')'
    *****************************************************************/
    = L_PARENS LAMBDA x:CHARACTER DOT term:TERM R_PARENS
    {
        return Abstraction(location(), x, term, true)
    }

ABSTRACTION_WO_PARENS
    /*****************************************************************
            'λ' -> Variable -> '.' -> Term
    *****************************************************************/
   = LAMBDA x:CHARACTER DOT term:TERM
   {
        return Abstraction(location(), x, term, false)
   }

//////////////////////////////////////////////////////////////////////
// Atoms
//////////////////////////////////////////////////////////////////////

LAMBDA "lambda"
    = 'λ'

L_PARENS "lParens"
    = '('

R_PARENS "rParens"
    = ')'

DOT "dot"
    = [\.]

CHARACTER "character"
    = [A-Za-z]
    {
        return text().trim() ;
    }

This compiles and runs fine on simple input. As I start to push through the examples to test the implementation I see some issues. Given the term

λl.λm.λn.lmn

It parses into

{
    "expr": "λl.λm.λn.lmn",
    "ast": " Abstraction( l,  Abstraction( m,  Abstraction( n, Application(  Variable( l ), Application(  Variable( m ),  Variable( n ) ) ) ) ) )"
}

The problem is in Left Application m should be applied to l and then n to that result. As you can see by the printout of the AST that n is applied to m and that result is applied to l which is not correct.

IF I change the rule that is in place to prevent left recursion issues where the application assumes that the left side is only a variable or an abstraction to include the possibility of application - then there is the recursion issue.

I introduced the concept of parens - but I stopped integrating them in. I really don't want them in the grammar.

  1. Can we fix this in the PEG.JS?
  2. OR Should I rewrite the construction of the Application Object (hack)?
  3. OR Is there a better way to parse this - e.g. roll a custom parser?

Solution

  • Flawed Approach: One approach I toyed with was to force the match of more than two Variables | Abstractions in a row and left apply them

    APP_2
        = lhs:ABSTRACTION_OR_VARIABLE rhs:ABSTRACTION_OR_VARIABLE
        {
            return Application(location(), lhs, rhs, false, "APP2")
        }
    
    APP_3
        = lhs:APP_2 rhs:TERM
        {
            return Application(location(), lhs, rhs, false, "APP3")
        }
    
    APPLICATION_WO_PARENS
        = APP_3
        / APP_2
    

    Which looks like it works when the application is of three terms. When there are four we get a balanced tree of 2 levels where we want an unbalanced tree of three levels... SO this is the incorrect output of the previous PEG (with input of lmno):

    {
        "expr": "lmno",
        "ast": "Application::APP3( 
                  Application::APP2(  Variable(l),  Variable(m) ),
                  Application::APP2(  Variable(n),  Variable(o) ) 
                )"
    }
    

    So I could build out any number of APP_2 ... APP_99 rules and force the left application. Which would work - until I exceeded the 99 (or whatever) number of applications. The solution would be a real hack and fragile.


    Working Hack Approach: Wanting to hack something together I changed the approach to matching an array of terms as an application:

    APP_ARR
        = terms:ABSTRACTION_OR_VARIABLE*
       {
            return reduceTerms(location(), terms)
       }
    
    APPLICATION_WO_PARENS
        = APP_ARR
    

    This approach requires me to write a bit of code to build the structure (reduceTerms) which I was trying to avoid. Here is the code:

    const reduceTerms = function(info, terms){
        const initialLhs = terms.shift()
        const initialRhs = terms.shift()
        const initialApp = Application(info, initialLhs, initialRhs, false, 'reduceTerms')
    
        const appAll = terms.reduce(
            (lhs, rhs) => Application(info, lhs, rhs, false, 'reduceTerms'),
            initialApp
        )
    
        return appAll ;
    }
    

    Please ignore the boolean and 'reduceTerms' character string. The boolean is used to indicate whether this application was contained within parens (going to remove the parens concept until I encounter it later in the book). The character string is a label on how/where I constructed this instance of the Application node (to debug how the parser is applying the rules).

    The reduceTerms function is a simple reduction of an array into the Application tree applying the terms in left application. The initial object is an Application of the left two items in the terms array. The reducing function will take that initial object as the lhs and the next term up as the rhs, which is exactly what you want. Here is the working output:

    {
        "expr": "lmno",
        "ast": "Application::reduceTerms( 
                  Application::reduceTerms( 
                    Application::reduceTerms(  Variable(l),  Variable(m) ),  
                  Variable(n) ),  
                Variable(o) )"
    }
    

    The one problem here is that I would need to do some extra work to get the info object to accurately reflect the matches. In this version the info object contains the range from which all the terms were matched out of. While not critical it would be nice to have that all tie out.


    So I am still looking for a solution that does this inside of the PEG without matching on the array and reducing into a tree.


    Progress on removing the left recursion: Using the published approach translating

    A -> A α | β
    

    to

    A -> β A'
    A' -> α A' | ε
    

    In PEG I have

    TERM_OR_VARIABLE
        = L_PARENS TERM R_PARENS
        / VARIABLE
    
    APP
       = lhs:TERM_OR_VARIABLE rhs:APP_
       {
           return Application(location(), lhs, rhs, false, "APP")
       }
    
    APP_
        = lhs:TERM_OR_VARIABLE rhs:APP_
        {
            return Application(location(), lhs, rhs, false, "APP_")
        }
        / lhs:TERM_OR_VARIABLE END
        {
            return lhs
        }
        / END
    
    END
        = !.
    
    APPLICATION_WO_PARENS
        = APP
    

    I am now able to parse the application lmno but the AST is right to left application

    Application::APP(  
        Variable(l), 
        Application::APP_(  
           Variable(m), 
           Application::APP_(  
              Variable(n),  
              Variable(o) 
           ) 
        ) 
    )