Consider this grammar
start: lvalue ASSIGN expr SEMICOLON | expr SEMICOLON;
expr: OPENPAREN expr CLOSEPAREN | literal | lvalue;
lvalue: ID lvalue_tail ;
lvalue_tail: OPENBRACK expr CLOSEBRACK | ;
ID : [a-zA-Z]+[a-zA-Z0-9_]* ;
so lvalue
is an expr
, but an expr
may not be an lvalue
.
with this input var[10]
, the grammar will need 4 lookahead (ID, openbrack, literal, closebrack) before it can determine if it should choose lvalue expr
or expr
.
How do I make such grammar LL(2)?
note: expr
here is simplified, symbol with capital letters are terminals.
I'm not sure what purpose might be served by reducing the LL lookahead of this grammar, compared with just using an LALR parser generator, since the unmodified grammar is LALR(1).
It can, however, be done by separating expr
into lvalue
and what we might call rvalue
-- i.e., expressions which are not usable on the left side of an assignment operator. That might produce something like
start: lvalue start_tail | rvalue SEMICOLON;
start_tail: ASSIGN expr SEMICOLON | SEMICOLON;
expr: lvalue | rvalue;
rvalue: OPENPAREN expr CLOSEPAREN | literal;
lvalue: ID lvalue_tail ;
lvalue_tail: OPENBRACK expr CLOSEBRACK | ;
ID : [a-zA-Z]+[a-zA-Z0-9_]* ;
This grammar is LL(1) in addition to being LL(2). However, it may prove to be tricky to scale it up to include more of the language.
(In case the above link rots, it just points to an online tool which verified that the grammar is LL(1).)