I am trying to define a non terminal symbol in a LALR(1) grammar (with CUP parser). It is requested that
the <code> token must appear exactly two times,
while <hour> token must appear at least one time.
In the end I came up with this definition:
section ::= hour_l CODE SC hour_l CODE SC hour_l ;
hour_l ::= /* epsilon */
| hour_l HOUR SC ;
Where SC
is a separator (semicolon) between tokens and hour_l
is the non terminal symbol for hour's list. This solution has a problem: HOUR
can be not present, because epsilon can be reduced to hour_l
. There is a clever solution than specifying all possibilities or using the semantic capabilities of CUP (ie. putting a counter of how many times HOUR
is present in section)? I'd prefer not to use semantics in order to achieve this; in fact, it seems to me it is syntax related.
My solution, suggested OOB by a friend, is to use a Finite State Machine:
section ::= c ;
a ::= CODE SC ;
b ::= a CODE SC ;
c ::= c HOUR SC | b HOUR SC | e CODE SC ;
d ::= HOUR SC | d HOUR SC ;
e ::= e HOUR SC | a HOUR SC | d CODE SC ;
c
is the final state accepted by this machine. I drew a Deterministic Finite Automata
and transformed it in a Right Regular Grammar.
Hope it helps. Thanks