Search code examples
compiler-constructiongrammarcontext-free-grammar

How do I make grammar with optional prefix LL(2)?


Consider this grammar

start: lvalue ASSIGN expr SEMICOLON | expr SEMICOLON;
expr: OPENPAREN expr CLOSEPAREN | literal | lvalue;
lvalue: ID lvalue_tail ;
lvalue_tail: OPENBRACK expr CLOSEBRACK | ;
ID : [a-zA-Z]+[a-zA-Z0-9_]* ;

so lvalue is an expr, but an expr may not be an lvalue.
with this input var[10], the grammar will need 4 lookahead (ID, openbrack, literal, closebrack) before it can determine if it should choose lvalue expr or expr.
How do I make such grammar LL(2)?
note: expr here is simplified, symbol with capital letters are terminals.


Solution

  • I'm not sure what purpose might be served by reducing the LL lookahead of this grammar, compared with just using an LALR parser generator, since the unmodified grammar is LALR(1).

    It can, however, be done by separating expr into lvalue and what we might call rvalue -- i.e., expressions which are not usable on the left side of an assignment operator. That might produce something like

    start: lvalue start_tail | rvalue SEMICOLON;
    start_tail: ASSIGN expr SEMICOLON | SEMICOLON;
    expr: lvalue | rvalue;
    rvalue: OPENPAREN expr CLOSEPAREN | literal;
    lvalue: ID lvalue_tail ;
    lvalue_tail: OPENBRACK expr CLOSEBRACK | ;
    ID : [a-zA-Z]+[a-zA-Z0-9_]* ;
    

    This grammar is LL(1) in addition to being LL(2). However, it may prove to be tricky to scale it up to include more of the language.

    (In case the above link rots, it just points to an online tool which verified that the grammar is LL(1).)