Search code examples
error-handlingjavacc

JavaCC custom errors cause "Expansion can be matched by empty string."


I wish to make my JavaCC parser return custom error messages that are more specific than the defaults.

I currently have a basic structure that is something like this:

Foo(): {} { (A() B())+ }
A():   {} { <TOKA1> | <TOKA2> }
B():   {} { <TOKB1> | <TOKB2> }

I have been researching how to throw custom error messages and the standard method seems to be something like:

A():   {} { <TOKA1> | <TOKA2> 
          | {throw new ParseException("Expected A, found " + getToken(1).image + ".");} }

However, implementing this on A and B causes the compiler to produce an error:

Expansion within "(...)+" can be matched by empty string.

This is understandable, since the parser does not 'know' that the empty options will terminate the parsing process. It predicts that it can match empty strings to infinity. Nevertheless, I can't find or think of any other easy way to throw errors like this. What is the best way of achieving my desired result?


Solution

  • Suppose your parser has taken one or more trips through the loop. Wouldn't you want the parser to leave the loop when the next token is not a TOKA1 or a TOKA2?

    E.g. if the rest of your grammar is

    void Start() : {} { Foo() C() <EOF> }
    

    You definitely do not want an error if the input is

     <TOKA1> <TOKB1> <TOKC> <EOF>
    

    where <TOKC> is some token that could be at the start of a C.

    So what I'd suggest is

     Foo(): {} { 
        AForSure()
        B() 
        (A() B())* }
     AForSure() : {} { A() | {throw new ParseException("Expected A, found " + getToken(1).image + ".");} }
     A():   {} { <TOKA1> | <TOKA2> }
     B():   {} { <TOKB1> | <TOKB2> | {throw new ParseException("Expected B, found " + getToken(1).image + ".");} }}
    

    That might not give you quite the quality of error messages you want. E.g. if the input is

     <TOKA1> <TOKB1> <TOKB2> <EOF>
    

    You might get an error "Expected C, found a TOKB2". So maybe you'll want to change that error to "Expected A or C, found a TOKB2".


    Another way to approach it is to use recursion instead of looping. Suppose your list is always in parentheses, so, for example, the only use of Foo is in Bar and Bar looks like this

    Bar() : {} { "(" Foo() ")" }
    

    So you want to exit the Foo loop only when you hit a ")". Anything else is an error. You can rewrite the grammar as

     Bar() : {} { "(" A("A") B() MoreFoos() }
    
     MoreFoos() : {} {  ")" |  A("A or ')'") B() MoreFoos() }
     A(String expected):   {} { <TOKA1> | <TOKA2> 
        | {throw new ParseException("Expected "+expected+", found " + getToken(1).image + ".");} } }
     B():   {} { <TOKB1> | <TOKB2>
        | {throw new ParseException("Expected B, found " + getToken(1).image + ".");} }}