Search code examples
javajavacc

Using LOOKAHEAD in nested conditions


I am maintaining old code that uses JavaCC to parse syntax.

My .jjt file broadly looks like this:

void Top() {}
{
    Bazz() OpenParenthesis() Foo() CloseParenthesis()
}
void Foo() {}
{
    Bar() Blah() A()
}

void A() {}
{
    Z() (B() Z())*
}
void Z() {}
{
    (OpenParenthesis())? X() Y() (CloseParenthesis())?
}

Legend:

  • Top is the main condition before <EOF>, enclosed in a method returning an instance of Node
  • OpenParenthesis and CloseParenthesis represent literal tokens for ( and ) respectively
  • The parser is instructed to ignore whitespace

My issue is that with a "simple" input like:

bazz ( bar blah x y )

... the closing parenthesis is consumed as part of Z's condition (the 0 or 1, ? quantifier), therefore the compulsory closing parenthesis in Top produces a syntax error, where the parser would either expect B or <EOF>.

JavaCC regular expressions do not feature fine-grained quantifiers as Java regex does, so I cannot use a reluctant quantifier for Z's closing parenthesis.

I have read about the LOOKAHEAD construct (some tutorial/docs here) and figured I could use one to infer whether the ending closing parenthesis should not be consumed by Z, thus re-writing Z as:

void Z() {}
{
    (OpenParenthesis())? X() Y() (LOOKAHEAD(1) CloseParenthesis())?
}

I've also monkeyed around with the size of the lookahead.

Unfortunately, either I do not understand the feature, or the lookahead will not work with hierachical syntax such as the one illustrated above.

Among the choices of poor workarounds I have found so far:

  • Remove the optional parenthesis from Z altogether
  • Make the closing parenthesis optional in Top

Obviously neither satisfies me at all.

Have I overlooked something?


Solution

  • Maybe I'm missing the intent of your code, but the rule

    void Z() {}
    {
        (OpenParenthesis())? X() Y() (CloseParenthesis())?
    }
    

    just looks very odd to me. Do you really want that all of the following sequences can be parsed as Z?

    ( x y )
    x y
    ( x y
    x y )
    

    If you think the last two should not be parsed as Z, then change the rule to

    void Z() {}
    {
        OpenParenthesis() X() Y() CloseParenthesis()
    |
        X() Y()
    }
    

    If you really really really want Z to be the way it is, make a comment and I'll submit a solution that will work, at least for the example above