Search code examples
recursioncontext-free-grammarebnflivescriptgrammar-kit

Avoiding left recursion in parsing LiveScript object definitions


I'm working on a parser for LiveScript language, and am having trouble with parsing both object property definition forms — key: value and (+|-)key — together. For example:

prop: "val"
+boolProp
-boolProp
prop2: val2

I have the key: value form working with this:

Expression ::= TestExpression
    | ParenExpression
    | OpExpression
    | ObjDefExpression
    | PropDefExpression
    | LiteralExpression
    | ReferenceExpression

PropDefExpression ::= Expression COLON Expression

ObjDefExpression ::= PropDefExpression (NEWLINE PropDefExpression)*

// ... other expressions

But however I try to add ("+"|"-") IDENTIFIER to PropDefExpression or ObjDefExpression, I get errors about using left recursion. What's the (right) way to do this?


Solution

  • The grammar fragment you posted is already left-recursive, i.e. without even adding (+|-)boolprop, the non-terminal 'Expression' derives a form in which 'Expression' reappears as the leftmost symbol:

    Expression -> PropDefExpression -> Expression COLON Expression
    

    And it's not just left-recursive, it's ambiguous. E.g.

    Expression COLON Expression COLON Expression
    

    can be derived in two different ways (roughly, left-associative vs right-associative).

    You can eliminate both these problems by using something more restricted on the left of the colon, e.g.:

    PropDefExpression ::= Identifier COLON Expression
    

    Also, another ambiguity: Expression derives PropDefExpression in two different ways, directly and via ObjDefExpression. My guess is, you can drop the direct derivation.

    Once you've taken care of those things, it seems to me you should be able to add (+|-)boolprop without errors (unless it conflicts with one of the other kinds of expression that you didn't show).

    Mind you, looking at the examples at http://livescript.net, I'm doubtful how much of that you'll be able to capture in a conventional grammar. But if you're just going for a subset, you might be okay.