Search code examples
parsinggrammarformal-languagescup

How to define at least one occurrence of a string between two tokens in a CUP parser grammar


I am trying to define a non terminal symbol in a LALR(1) grammar (with CUP parser). It is requested that

the <code> token must appear exactly two times, 
while <hour> token must appear at least one time.

In the end I came up with this definition:

section     ::= hour_l CODE SC hour_l CODE SC hour_l ;
hour_l      ::= /* epsilon */ 
            | hour_l HOUR SC ;

Where SC is a separator (semicolon) between tokens and hour_l is the non terminal symbol for hour's list. This solution has a problem: HOUR can be not present, because epsilon can be reduced to hour_l. There is a clever solution than specifying all possibilities or using the semantic capabilities of CUP (ie. putting a counter of how many times HOUR is present in section)? I'd prefer not to use semantics in order to achieve this; in fact, it seems to me it is syntax related.


Solution

  • My solution, suggested OOB by a friend, is to use a Finite State Machine:

    section     ::= c ;
    a           ::= CODE SC ;
    b           ::= a CODE SC ;
    c           ::= c HOUR SC | b HOUR SC | e CODE SC ;
    d           ::= HOUR SC | d HOUR SC ;
    e           ::= e HOUR SC | a HOUR SC | d CODE SC ;
    

    c is the final state accepted by this machine. I drew a Deterministic Finite Automata

    DFA

    and transformed it in a Right Regular Grammar.

    Hope it helps. Thanks