Search code examples
parsingjavaccregex-lookarounds

JavaCC lookahead option doesn't work


I'm using JavaCC 6.0 and I need to set the lookahead option to 2 because of the following choice conflict:

double Func() :
{}
{
    <STRING> "(" ( (<STRING> | Expression() ) "," )*  ")"
}

The conflict exists because an Expression() can begin with a <STRING> and I'm getting "Consider using a lookahead of 2 for earlier expansion."

So I changed the lookahead option to
options { LOOKAHEAD = 2; FORCE_LA_CHECK = true; ...}

but I'm still getting the same warning and the parser fails when it needs to detect an expression instead of a string.

Am I doing something wrong or the lookahead option just doesn't work?


Solution

  • I never use a global setting of LOOKAHEAD other than 1. Instead I would use a local look-ahead exactly where it is needed. In your case I would do the following

    double Func() :
    {}
    {
        <STRING>
        "("
        ( 
             ( LOOKAHEAD( <STRING> "," )
               <STRING>
             | Expression()
             )
             ","
        )*
        ")"
    }
    

    It's rather odd that you require a comma after the final argument. If you don't want that, you can do this

    double Func() :
    {}
    {
        <STRING>
        "("
        ( 
             ( LOOKAHEAD( <STRING> ("," | ")") ) <STRING> | Expression() )
             (  ","
                ( LOOKAHEAD( <STRING> ("," | ")") ) <STRING> | Expression() )
             )*
    
        )?
        ")"
    }
    

    However, in both snippets above, these look-ahead specifications violate the advice in FAQ 4.8 that the tokens scanned by a syntactic look-ahead specification should all be consumed by the choice. This could be a problem if you ever use func() itself in a look-ahead spec. For the first syntax, this is easy to deal with: distribute the comma like this

    double Func() :
    {}
    {
        <STRING>
        "("
        ( 
             ( LOOKAHEAD(<STRING> "," )
               <STRING> ","
             | Expression() ","
             )
        )*
        ")"
    }
    

    For the second syntax (the one without a comma after the last argument), you can use recursion like this:

    double Func() :
    {}
    {
        <STRING> "("  ( ")" | Args() )
    }
    
    void Args() :
    {}
        LOOKAHEAD(<STRING> ("," | ")" )
        <STRING>
        ("," Args() | ")" )
    |   
        Expression()
        ("," Args() | ")" )
    }
    

    The recursive version also has the benefit that it has less repetition than the iterative version.