Search code examples
javascriptparsingabstract-syntax-treepegpegjs

Eliminate left-recursion in unambiguous PEG grammar


I've read through a bunch of existing questions here on StackOverflow, but I can't get my grammar straight.

statement "Statement" =
    assignment / primitive / reference / operation
operation "Operation" =
    statement operator:operator statement
operator "Operator" =
    "+"

* Note that I might add more operator patterns to the operator rule (i.e. "==", "**", "/") later on. Thus, it could become more complex.

I'm actually using PEG.js instead of plain PEG here, hence the unconventional syntax.

The compiler complains to me about statement visiting the operation , which in turn visits the statement and so on, which leads to a possible infinite loop. This is commonly known as left-recursion.

For this particular case, I wasn't able to understand how I could resolve this issue. I didn't quite get the idea of the head and tail pattern for this scenario.


Note: I would like to re-use the statement rule in other rules as conveniently as possible. So splitting it in a dozen of individual rules as some work-around could be a solution, but won't really help me as an answer.


Solution

  • With the great help of some friend who understood the whole PEG parsing architecture within 10 minutes by the way , I was able to solve the problem for this scenario.

    Let's ask this:

    In an operation in the form of left+right , will we ever have an assignment (of the form a=b) or another operation (of the form a+b) on the left side?

    The answer is most likely no.

    Hence, this grammar works:

    statement "Statement" =
        assignment / operation / primitive / reference
    operation "Operation" =
        (primitive / reference) operator statement
    operator "Operator" =
        "+"
    

    As you can see, I just inlined the statement rule in some kind by replacing it with the primitive / reference part from the statement rule, omitting the assignment and operation parts since we don't need them on the left side.

    This is just the solution. It's working fine.

    However, if you're wondering why parsing inputs like 1+2+3+4+5 works although we obviously don't allow recursive operation s on the left side of the +, be aware that we still allow them on the right side. And right-recursion is a totally valid thing in PEG.