Search code examples
parsingrecursionrecursive-descentleft-recursion

Parsing + and * in boolean expressions by recursive descent


I am writing a recursive descent parser for Boolean expressions, for example:

(1 * 0) 
(0 + ~1)
(0 * (1 + c)

Where 1 is 'True', 0 is 'False', + is 'or', * is 'and', ~ is 'not' and 'c' is just some variable name (it could be any single alphabetic letter). I plan on using parentheses rather than implementing some kind of order of operations.

My current parser can recognize the following form of expression

Expression ::= 1
             | 0 
             | Character
             | ~ Expression

But I am unsure as to how I would implement + and * on top of this. I am fairly certain from what I have read the obvious implementation of

Expression ::= 1
             | 0
             | Character
             | ( Expression + Expression )
             | ( Expression * Expression )

Would cause an infinite loop as it is 'left-recursive'. I am unsure how to change this to remove such infinite recursion.


Solution

  • With the parenthesis in place, what you have there is not left recursive. Left recursion is when a production can reach itself (directly or indirectly) with no tokens consumed in between. Such grammars do indeed cause infinite recursion in recursive descent parsers, but that can't happen with yours.

    You do have the issue that the grammar as it stands is ambiguous: After a parenthesis, it isn't known whether the + or the * form is being parsed until the entire left-hand expression has been parsed.

    One way of getting around that issue is by pulling up the common parts in a shared prefix/suffix production:

    Expression ::= 1
                 | 0
                 | Character
                 | ParExpr
    
    ParExpr ::= ( Expression ParOp  Expression )
    
    ParOp ::= +
            | *