Search code examples
javascriptparsinggrammarecmascript-5javacc

LOOKAHEADs for the JavaScript/ECMAScript array literal production


I currently implementing a JavaScript/ECMAScript 5.1 parser with JavaCC and have problems with the ArrayLiteral production.

ArrayLiteral :
    [ Elision_opt ]
    [ ElementList ]
    [ ElementList , Elision_opt ]

ElementList :
    Elision_opt AssignmentExpression
    ElementList , Elision_opt AssignmentExpression

Elision :
    ,
    Elision ,

I have three questions, I'll ask them one by one.

This is the second one.


I have simplified this production to the following form:

ArrayLiteral:
    "[" ("," | AssignmentExpression ",") * AssignmentExpression ? "]"

Please see the first question on whether it is correct or not:

How to simplify JavaScript/ECMAScript array literal production?

Now I have tried to implement it in JavaCC as follows:

void ArrayLiteral() :
{
}
{
    "["
    (
        ","
    |   AssignmentExpression()
        ","
    ) *
    (
        AssignmentExpression()
    ) ?
    "]"
}

JavaCC complains about ambiguous , or AssignmentExpression (its contents). Obviously, a LOOKAHEAD specification is required. I have spent a lot of time trying to figure the LOOKAHEADs out, tried different things like

  • LOOKAHEAD (AssignmentExpression() ",") in (...)*
  • LOOKAHEAD (AssignmentExpression() "]") in (...)?

and a few other variations, but I could not get rid of the JavaCC warning.

I fail to understand why this does not work:

void ArrayLiteral() :
{
}
{
    "["
    (
        LOOKAHEAD ("," | AssignmentExpression() ",")
        ","
    |   AssignmentExpression()
        ","
    ) *
    (
        LOOKAHEAD (AssignmentExpression() "]")
        AssignmentExpression()
    ) ?
    "]"
}

Ok, AssignmentExpression() per se is ambiguous, but the trailing "," or "]" in LOOKAHEADs should make it clear which of the choices should be taken - or am I mistaken here?

What would a correct LOOKAHEAD specification for this production look like?

Update

This did not work, unfortunately:

void ArrayLiteral() :
{
}
{
    "["
    (
        ","
    |
        LOOKAHEAD (AssignmentExpression() ",")
        AssignmentExpression()
        ","
    ) *
    (
        AssignmentExpression()
    ) ?
    "]"
}

Warning:

Warning: Choice conflict in (...)* construct at line 6, column 5.
         Expansion nested within construct and expansion following construct
         have common prefixes, one of which is: "function"
         Consider using a lookahead of 2 or more for nested expansion.

Line 6 is ( before the first LOOKAHEAD. The common prefix "function" is simply one of the possible starts of AssignmentExpression.


Solution

  • Here is yet another approach. It has the advantage of identifying which commas indicate an undefined elements without using any semantic actions.

    void ArrayLiteral() : {} { "[" MoreArrayLiteral() }
    
    void MoreArrayLiteral() : {} {
        "]"
    |    "," /* undefined item */ MoreArrayLiteral()
    |    AssignmentExpression() ( "]" |  "," MoreArrayLiteral() )
    }