Search code examples
javajavacc

Progress to next loop iteration when using nullable/optional tokens


I have to write parser to be used in java app which will accept:

  • numbers (ex: 1 2 3)
  • ranges of numbers (ex: 1-3)
  • named ranges (ex: GROUP_1_MATCHED)

Each token is separated by either:

<WHITE : ([" ", "\t"])+ >
<COMMA : (",") >
<SEMICOLON : (";") >
<EOL : ("\r" | "\n" | "\r\n") >

Everything would be easy if range hadn't optional spaces like:

1-  2
2  -3
3  -    4
4-5

Test string is this: " 1 2 3 4 5,6,7;8;9,, 10;11;;, ;,;,,;\n\n ;,,; 12,13-13, 14 - 14 15- 15 16 -16 \n17-17\n 18 - 18\n 19 - 19 \n GROUP_1_A;GROUP_1_A GROUP_1_A;GROUP_1_A,GROUP_1_A ,;;\n\n \"GROUP_1_A\" ;; 20"

I have tried several ways of defining the white spaces around "-" but all in all ended either in infinite nested loop which process given simple string till the end and then start from the beginning or just can't go to next iteration. It would be easy if there was a way to check visit next token without consuming it.

SKIP: {
    < QUOTATION :  ( ["\""] ) > |
    < APOSTROPHE : ( ["'"] ) >
}

TOKEN: {
    < NAME :            ( ["a"-"z", "A"-"Z"])+ (["a"-"z", "A"-"Z", "_", "0"-"9"] )* > |
    < NUM :             ( ["0"-"9"] ){1,5} > |
    < WHITE :           ( [" ", "\t"] ) > |
    < EOL :             ( "\n" | "\r" | "\r\n" ) > |
    < COMMA :           ( [","] ) > |
    < SEMICOLON :       ( [";"] ) >
}

Map<String, List<String>> parse() : {
    Map<String, List<String>> result = new HashMap<String, List<String>>();
    List<String> single = new ArrayList<String>();
    List<String> range = new ArrayList<String>();
    List<String> named = new ArrayList<String>();
    result.put(SINGLE, single);
    result.put(RANGE, range);
    result.put(NAMED, named);
    Token name = null;
    Token first = null;
    Token last = null;
}
{
    (<WHITE>)*
    (
        (name = <NAME> |
            first = <NUM>
            (LOOKAHEAD(2) (<WHITE>)* "-" (<WHITE>)* last = <NUM>)?
        )
        ((LOOKAHEAD(2) <EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+ | <EOF>)

        {
            if (name != null) {
                named.add(name.image);
            } else if (first != null && last == null) {
                single.add(first.image);
            } else if (first != null && last != null) {
                String s = first.image + " - " + last.image;
                range.add(s);
            } else {
                System.err.println("Parser error found");
            }

            name = null;
            first = null;
            last = null;
        }
    )+
    {
        return result;
    }
}

And here is output from parsing:

Call:   parse
  Consumed token: <<WHITE>: " " at line 1 column 1>
  Consumed token: <<WHITE>: " " at line 1 column 2>
  Consumed token: <<NUM>: "1" at line 1 column 3>
  Visited token: <<WHITE>: " " at line 1 column 4>; Expected token: <<WHITE>>
  Visited token: <<NUM>: "2" at line 1 column 5>; Expected token: <<WHITE>>
  Visited token: <<NUM>: "2" at line 1 column 5>; Expected token: <"-">
  Visited token: <<WHITE>: " " at line 1 column 4>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 1 column 4>
  Consumed token: <<NUM>: "2" at line 1 column 5>
  Visited token: <<WHITE>: " " at line 1 column 6>; Expected token: <<WHITE>>
  Visited token: <<NUM>: "3" at line 1 column 7>; Expected token: <<WHITE>>
  Visited token: <<NUM>: "3" at line 1 column 7>; Expected token: <"-">
  Visited token: <<WHITE>: " " at line 1 column 6>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 1 column 6>
  Consumed token: <<NUM>: "3" at line 1 column 7>
  Visited token: <<WHITE>: " " at line 1 column 8>; Expected token: <<WHITE>>
  Visited token: <<NUM>: "4" at line 1 column 9>; Expected token: <<WHITE>>
  Visited token: <<NUM>: "4" at line 1 column 9>; Expected token: <"-">
  Visited token: <<WHITE>: " " at line 1 column 8>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 1 column 8>
  Consumed token: <<NUM>: "4" at line 1 column 9>
  Visited token: <<WHITE>: " " at line 1 column 10>; Expected token: <<WHITE>>
  Visited token: <<NUM>: "5" at line 1 column 11>; Expected token: <<WHITE>>
  Visited token: <<NUM>: "5" at line 1 column 11>; Expected token: <"-">
  Visited token: <<WHITE>: " " at line 1 column 10>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 1 column 10>
  Consumed token: <<NUM>: "5" at line 1 column 11>
  Visited token: <<COMMA>: "," at line 1 column 12>; Expected token: <<WHITE>>
  Visited token: <<COMMA>: "," at line 1 column 12>; Expected token: <"-">
  Visited token: <<COMMA>: "," at line 1 column 12>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 1 column 12>
  Consumed token: <<NUM>: "6" at line 1 column 13>
  Visited token: <<COMMA>: "," at line 1 column 14>; Expected token: <<WHITE>>
  Visited token: <<COMMA>: "," at line 1 column 14>; Expected token: <"-">
  Visited token: <<COMMA>: "," at line 1 column 14>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 1 column 14>
  Consumed token: <<NUM>: "7" at line 1 column 15>
  Visited token: <<SEMICOLON>: ";" at line 1 column 16>; Expected token: <<WHITE>>
  Visited token: <<SEMICOLON>: ";" at line 1 column 16>; Expected token: <"-">
  Visited token: <<SEMICOLON>: ";" at line 1 column 16>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 1 column 16>
  Consumed token: <<NUM>: "8" at line 1 column 17>
  Visited token: <<SEMICOLON>: ";" at line 1 column 18>; Expected token: <<WHITE>>
  Visited token: <<SEMICOLON>: ";" at line 1 column 18>; Expected token: <"-">
  Visited token: <<SEMICOLON>: ";" at line 1 column 18>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 1 column 18>
  Consumed token: <<NUM>: "9" at line 1 column 19>
  Visited token: <<COMMA>: "," at line 1 column 20>; Expected token: <<WHITE>>
  Visited token: <<COMMA>: "," at line 1 column 20>; Expected token: <"-">
  Visited token: <<COMMA>: "," at line 1 column 20>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 1 column 20>
  Visited token: <<COMMA>: "," at line 1 column 21>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 1 column 21>
  Visited token: <<WHITE>: " " at line 1 column 22>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 1 column 22>
  Visited token: <<WHITE>: " " at line 1 column 23>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 1 column 23>
  Consumed token: <<NUM>: "10" at line 1 column 24>
  Visited token: <<SEMICOLON>: ";" at line 1 column 26>; Expected token: <<WHITE>>
  Visited token: <<SEMICOLON>: ";" at line 1 column 26>; Expected token: <"-">
  Visited token: <<SEMICOLON>: ";" at line 1 column 26>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 1 column 26>
  Consumed token: <<NUM>: "11" at line 1 column 27>
  Visited token: <<SEMICOLON>: ";" at line 1 column 29>; Expected token: <<WHITE>>
  Visited token: <<SEMICOLON>: ";" at line 1 column 29>; Expected token: <"-">
  Visited token: <<SEMICOLON>: ";" at line 1 column 29>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 1 column 29>
  Visited token: <<SEMICOLON>: ";" at line 1 column 30>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 1 column 30>
  Visited token: <<COMMA>: "," at line 1 column 31>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 1 column 31>
  Visited token: <<WHITE>: " " at line 1 column 32>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 1 column 32>
  Visited token: <<WHITE>: " " at line 1 column 33>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 1 column 33>
  Visited token: <<SEMICOLON>: ";" at line 1 column 34>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 1 column 34>
  Visited token: <<COMMA>: "," at line 1 column 35>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 1 column 35>
  Visited token: <<SEMICOLON>: ";" at line 1 column 36>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 1 column 36>
  Visited token: <<COMMA>: "," at line 1 column 37>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 1 column 37>
  Visited token: <<COMMA>: "," at line 1 column 38>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 1 column 38>
  Visited token: <<SEMICOLON>: ";" at line 1 column 39>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 1 column 39>
  Visited token: <<EOL>: "\n" at line 1 column 40>; Expected token: <<EOL>>
  Consumed token: <<EOL>: "\n" at line 1 column 40>
  Visited token: <<EOL>: "\n" at line 2 column 1>; Expected token: <<EOL>>
  Consumed token: <<EOL>: "\n" at line 2 column 1>
  Visited token: <<WHITE>: " " at line 3 column 1>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 3 column 1>
  Visited token: <<WHITE>: " " at line 3 column 2>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 3 column 2>
  Visited token: <<WHITE>: " " at line 3 column 3>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 3 column 3>
  Visited token: <<SEMICOLON>: ";" at line 3 column 4>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 3 column 4>
  Visited token: <<COMMA>: "," at line 3 column 5>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 3 column 5>
  Visited token: <<COMMA>: "," at line 3 column 6>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 3 column 6>
  Visited token: <<SEMICOLON>: ";" at line 3 column 7>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 3 column 7>
  Visited token: <<WHITE>: " " at line 3 column 8>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 3 column 8>
  Visited token: <<WHITE>: " " at line 3 column 9>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 3 column 9>
  Consumed token: <<NUM>: "12" at line 3 column 10>
  Visited token: <<COMMA>: "," at line 3 column 12>; Expected token: <<WHITE>>
  Visited token: <<COMMA>: "," at line 3 column 12>; Expected token: <"-">
  Visited token: <<COMMA>: "," at line 3 column 12>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 3 column 12>
  Consumed token: <<NUM>: "13" at line 3 column 13>
  Visited token: <"-" at line 3 column 15>; Expected token: <<WHITE>>
  Visited token: <"-" at line 3 column 15>; Expected token: <"-">
  Visited token: <<NUM>: "13" at line 3 column 16>; Expected token: <<WHITE>>
  Visited token: <<NUM>: "13" at line 3 column 16>; Expected token: <<NUM>>
  Consumed token: <"-" at line 3 column 15>
  Consumed token: <<NUM>: "13" at line 3 column 16>
  Visited token: <<COMMA>: "," at line 3 column 18>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 3 column 18>
  Visited token: <<WHITE>: " " at line 3 column 19>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 3 column 19>
  Visited token: <<WHITE>: " " at line 3 column 20>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 3 column 20>
  Consumed token: <<NUM>: "14" at line 3 column 21>
  Visited token: <<WHITE>: " " at line 3 column 23>; Expected token: <<WHITE>>
  Visited token: <<WHITE>: " " at line 3 column 24>; Expected token: <<WHITE>>
  Consumed token: <<WHITE>: " " at line 3 column 23>
  Consumed token: <<WHITE>: " " at line 3 column 24>
  Consumed token: <"-" at line 3 column 25>
  Consumed token: <<WHITE>: " " at line 3 column 26>
  Consumed token: <<WHITE>: " " at line 3 column 27>
  Consumed token: <<WHITE>: " " at line 3 column 28>
  Consumed token: <<WHITE>: " " at line 3 column 29>
  Consumed token: <<NUM>: "14" at line 3 column 30>
  Visited token: <<WHITE>: " " at line 3 column 32>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 3 column 32>
  Consumed token: <<NUM>: "15" at line 3 column 33>
  Visited token: <"-" at line 3 column 35>; Expected token: <<WHITE>>
  Visited token: <"-" at line 3 column 35>; Expected token: <"-">
  Visited token: <<WHITE>: " " at line 3 column 36>; Expected token: <<WHITE>>
  Consumed token: <"-" at line 3 column 35>
  Consumed token: <<WHITE>: " " at line 3 column 36>
  Consumed token: <<NUM>: "15" at line 3 column 37>
  Visited token: <<WHITE>: " " at line 3 column 39>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 3 column 39>
  Consumed token: <<NUM>: "16" at line 3 column 40>
  Visited token: <<WHITE>: " " at line 3 column 42>; Expected token: <<WHITE>>
  Visited token: <"-" at line 3 column 43>; Expected token: <<WHITE>>
  Visited token: <"-" at line 3 column 43>; Expected token: <"-">
  Consumed token: <<WHITE>: " " at line 3 column 42>
  Consumed token: <"-" at line 3 column 43>
  Consumed token: <<NUM>: "16" at line 3 column 44>
  Visited token: <<WHITE>: " " at line 3 column 46>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 3 column 46>
  Visited token: <<EOL>: "\n" at line 3 column 47>; Expected token: <<EOL>>
  Consumed token: <<EOL>: "\n" at line 3 column 47>
  Consumed token: <<NUM>: "17" at line 4 column 1>
  Visited token: <"-" at line 4 column 3>; Expected token: <<WHITE>>
  Visited token: <"-" at line 4 column 3>; Expected token: <"-">
  Visited token: <<NUM>: "17" at line 4 column 4>; Expected token: <<WHITE>>
  Visited token: <<NUM>: "17" at line 4 column 4>; Expected token: <<NUM>>
  Consumed token: <"-" at line 4 column 3>
  Consumed token: <<NUM>: "17" at line 4 column 4>
  Visited token: <<EOL>: "\n" at line 4 column 6>; Expected token: <<EOL>>
  Consumed token: <<EOL>: "\n" at line 4 column 6>
  Visited token: <<WHITE>: " " at line 5 column 1>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 5 column 1>
  Consumed token: <<NUM>: "18" at line 5 column 2>
  Visited token: <<WHITE>: " " at line 5 column 4>; Expected token: <<WHITE>>
  Visited token: <"-" at line 5 column 5>; Expected token: <<WHITE>>
  Visited token: <"-" at line 5 column 5>; Expected token: <"-">
  Consumed token: <<WHITE>: " " at line 5 column 4>
  Consumed token: <"-" at line 5 column 5>
  Consumed token: <<WHITE>: " " at line 5 column 6>
  Consumed token: <<NUM>: "18" at line 5 column 7>
  Visited token: <<EOL>: "\n" at line 5 column 9>; Expected token: <<EOL>>
  Consumed token: <<EOL>: "\n" at line 5 column 9>
  Visited token: <<WHITE>: " " at line 6 column 1>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 6 column 1>
  Consumed token: <<NUM>: "19" at line 6 column 2>
  Visited token: <<WHITE>: " " at line 6 column 4>; Expected token: <<WHITE>>
  Visited token: <"-" at line 6 column 5>; Expected token: <<WHITE>>
  Visited token: <"-" at line 6 column 5>; Expected token: <"-">
  Consumed token: <<WHITE>: " " at line 6 column 4>
  Consumed token: <"-" at line 6 column 5>
  Consumed token: <<WHITE>: " " at line 6 column 6>
  Consumed token: <<NUM>: "19" at line 6 column 7>
  Visited token: <<WHITE>: " " at line 6 column 9>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 6 column 9>
  Visited token: <<EOL>: "\n" at line 6 column 10>; Expected token: <<EOL>>
  Consumed token: <<EOL>: "\n" at line 6 column 10>
  Visited token: <<WHITE>: " " at line 7 column 1>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 7 column 1>
  Consumed token: <<NAME>: "GROUP_1_A" at line 7 column 2>
  Visited token: <<SEMICOLON>: ";" at line 7 column 20>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 7 column 20>
  Consumed token: <<NAME>: "GROUP_1_A" at line 7 column 21>
  Visited token: <<WHITE>: " " at line 7 column 39>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 7 column 39>
  Consumed token: <<NAME>: "GROUP_1_A" at line 7 column 40>
  Visited token: <<SEMICOLON>: ";" at line 7 column 58>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 7 column 58>
  Consumed token: <<NAME>: "GROUP_1_A" at line 7 column 59>
  Visited token: <<COMMA>: "," at line 7 column 77>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 7 column 77>
  Consumed token: <<NAME>: "GROUP_1_A" at line 7 column 78>
  Visited token: <<WHITE>: " " at line 7 column 96>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 7 column 96>
  Visited token: <<WHITE>: " " at line 7 column 97>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 7 column 97>
  Visited token: <<COMMA>: "," at line 7 column 98>; Expected token: <<EOL>>
  Consumed token: <<COMMA>: "," at line 7 column 98>
  Visited token: <<SEMICOLON>: ";" at line 7 column 99>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 7 column 99>
  Visited token: <<SEMICOLON>: ";" at line 7 column 100>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 7 column 100>
  Visited token: <<EOL>: "\n" at line 7 column 101>; Expected token: <<EOL>>
  Consumed token: <<EOL>: "\n" at line 7 column 101>
  Visited token: <<EOL>: "\n" at line 8 column 1>; Expected token: <<EOL>>
  Consumed token: <<EOL>: "\n" at line 8 column 1>
  Visited token: <<WHITE>: " " at line 9 column 1>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 9 column 1>
  Visited token: <<WHITE>: " " at line 9 column 2>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 9 column 2>
  Visited token: <<WHITE>: " " at line 9 column 3>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 9 column 3>
  Consumed token: <<NAME>: "GROUP_1_A" at line 9 column 5>
  Visited token: <<WHITE>: " " at line 9 column 24>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 9 column 24>
  Visited token: <<WHITE>: " " at line 9 column 25>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 9 column 25>
  Visited token: <<SEMICOLON>: ";" at line 9 column 26>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 9 column 26>
  Visited token: <<SEMICOLON>: ";" at line 9 column 27>; Expected token: <<EOL>>
  Consumed token: <<SEMICOLON>: ";" at line 9 column 27>
  Visited token: <<WHITE>: " " at line 9 column 28>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 9 column 28>
  Visited token: <<WHITE>: " " at line 9 column 29>; Expected token: <<EOL>>
  Consumed token: <<WHITE>: " " at line 9 column 29>
  Consumed token: <<NUM>: "20" at line 9 column 30>
  Visited token: <<WHITE>: " " at line 9 column 32>; Expected token: <<WHITE>>
  Visited token: <<WHITE>: " " at line 9 column 33>; Expected token: <<WHITE>>
  Consumed token: <<WHITE>: " " at line 9 column 32>
  Consumed token: <<WHITE>: " " at line 9 column 33>
Return: parse

parsers.excel.ParseException: Encountered " <NUM> "1 "" at line 9, column 34.
Was expecting one of:
    <WHITE> ...
    "-" ...

Parser should produce similar output:

single = [1,2,3,4,5,6,7,8,9,10,11,12,20]
range = [13 - 13,14 - 14,15 - 15,16 - 16,17 - 17,18 - 18,19 - 19]
named = [GROUP_1_A,GROUP_1_A,GROUP_1_A,GROUP_1_A,GROUP_1_A,GROUP_1_A]

Problem occur when parser doesn't know if space come from space before dash or space which is separator of the whole number.

If you know of any way to modify the JavaCC to accomplish parsing provided string correctly it would be greatly appreciated.


Solution

  • Let's step back a bit from JavaCC and see what your grammar actually is.

    parse --> ows ( body )+
    body --> part sep
    part --> <NAME>
    part --> <NUM>
    part --> <NUM> ows "-" ows <NUM>
    sep --> (<EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+
    sep -->  EOF
    ows --> (<WHITE>)*
    

    You should check that over to make sure that (a) I haven't made any mistakes and (b) this really is the language you intended.

    I don't like the way you are dealing with EOF. It's not really a separator. I'd suggest using the following grammar which is practically identical

    parse --> ows body
    body --> part ( sep body | <EOF> )
    part --> <NAME>
    part --> <NUM>
    part --> <NUM> ows "-" ows <NUM>
    sep --> (<EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+
    ows --> (<WHITE>)*
    

    First solution: Syntactic lookahead

    The OP said It would be easy if there was a way to check [the] next token without consuming it. There is. It's called syntactic lookahead.

    The only place we need lookahead is to distinguish the second and third productions for part. Let's combine them.

    part --> <NAME>
    part --> <NUM> ( ows "-" ows <NUM> )?
    

    No fixed length lookahead determine whether or not to take the optional path in the second production. So we use syntactic lookahead like this:

    part --> <NAME>
    part --> <NUM> ( LOOKAHEAD( ows "-" ) ows "-" ows <NUM> )?
    

    Now, we're done. Let's put the production back into JavaCC

    void parse() : { }
    {
        ows() body }
    }
    
    void body() : { }
    {
        part() ( sep() body()  | <EOF> )
    }
    
    void part() : { }
    {
       <NAME>
    |
       <NUM>
       ( LOOKAHEAD( ows() "-")
         ows() "-" ows() <NUM>
       )?
    }
    
    void sep() : {}
    {
        (<EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+
    }
    
    void ows() : {}
    {
        (<WHITE>)*
    }
    

    Second solution: LL(1)

    Could we solve it with an LL(1) grammar? Yes. Let's go back to the original grammar, or rather the grammar that takes the EOF out of the loop.

    parse --> ows body
    body --> part (sep body | <EOF>)
    part --> <NAME>
    part --> <NUM> ( ows "-" ows <NUM> )?
    sep --> (<EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+
    ows --> (<WHITE>)*
    

    Inline part and introduce nonterminal afternum

    parse --> ows body
    body --> <NAME> (sep body | <EOF>)
    body --> <NUM> afternum
    afternum --> ( ows "-" ows <NUM> )? (sep body | <EOF>)
    sep --> (<EOL> | <COMMA> | <SEMICOLON> | <WHITE>)+
    ows --> (<WHITE>)*
    

    Now the problem is in afternum.

    When we start to parse afternum there are 5 possibilities to consider. (i) The next token is a "-". (ii) The next token is an EOL, COMMA, or SEMICOLON. (iii) The next token is a white space. (iv) The next token is an EOF. (v) In any other case, we have an error.

    In case (ii) this can't be the last part. In case (iii), the WHITE we just saw could have been the first character of a sep or it might be leading to a hyphen. We make a new nonterminal to deal with both possibilities.

    afternum --> "-" ows <NUM> (sep body | <EOF>)
    afternum --> nonwssep (sep)? body
    afternum --> <WHITE> moreafternum
    afternum --> EOF
    
    moreafternum --> ows "-" ows <NUM> (sep body | EOF)
                   | sep? body
    
    nonwssep --> <EOL> | <COMMA> | <SEMICOLON>
    

    Now the problem is in moreafternum, since, if the next token is WHITE, either choice is viable.

    Let's manipulate moreafternum a bit. The goal is to expose that WHITE token so we can factor it out.

        moreafternum
    = By definition
        ows "-" ows <NUM> (sep body | EOF) | sep? body
    = Expand the ?
          ows "-" ows <NUM> (sep body | EOF)
        | body
        | sep body
    = Expand first `ows` and split white from other cases
          "-" ows <NUM> (sep body | EOF)
        | WHITE ows "-" ows <NUM> (sep body | EOF)
        | body
        | sep body
    = Expand the `sep` in the fourth case
          "-" ows <NUM> (sep body | EOF)
        | WHITE ows "-" ows <NUM> (sep body | EOF)
        | body
        | (WHITE | nonwesep) sep? body
    = Split the fourth case
          "-" ows <NUM> (sep body | EOF)
        | WHITE ows "-" ows <NUM> (sep body | EOF)
        | body
        | WHITE sep? body
        | nonwssep sep? body
    = Duplicate the fourth choice
          "-" ows <NUM> (sep body | EOF)
        | WHITE ows "-" ows <NUM> (sep body | EOF)
        | WHITE sep? body
        | body
        | WHITE sep? body
        | nonwssep sep?
    = Combine the second and third choices.
          "-" ows <NUM> (sep body | EOF)
        | WHITE ( ows "-" ows <NUM> (sep body | EOF) | sep? body )
        | body
        | WHITE sep? body
        | nonwssep sep? body
    = combine the third, fourth, and fifth choices
          "-" ows <NUM> (sep body | EOF)
        | WHITE ( ows "-" ows <NUM> (sep body | EOF) | sep? body)
        | sep? body
    = Definition of moreafternum
          "-" ows <NUM> (sep body | EOF)
        | WHITE moreafternum
        | sep? body
    

    Now we can redefine moreafternum with this recursive version

    moreafternum --> "-" ows <NUM> (sep body | EOF)
                   | <WHITE> moreafternum
                   | sep? body
    

    If we code this production in JavaCC, there will still be a choice conflict between the second and third choices when the next token is WHITE. JavaCC will prefer then second over the third, which is what we want. If you don't like the warning you can supress it with a LOOKAHEAD. Note that this LOOKAHEAD will not change the Java code produced, it simply suppresses the warning.

    void moreafternum() : {} {
           "-" ows() <NUM> (sep() body() | <EOF>)
    |
           // LOOKAHEAD( <WHITE> ) // Optional lookahead to suppresss the warning
           <WHITE> moreafternum()
    |
           ( sep() )? body() }
    

    We can go all the way to LL(1) by taking another look at moreafternum.

         moreafternum
    =  From above
          "-" ows <NUM> (sep body | EOF)
        | WHITE ( ows "-" ows <NUM> (sep body | EOF) | sep? body)
        | body
        | WHITE sep? body
        | nonwssep sep? body
    = Fourth choice is subsumed by the second.
          "-" ows <NUM> (sep body | EOF)
        | WHITE ( ows "-" ows <NUM> (sep body | EOF) | sep? body)
        | body
        | nonwssep sep? body
    = Combine last two choices
          "-" ows <NUM> (sep body | EOF)
        | WHITE ( ows "-" ows <NUM> (sep body | EOF) | sep? body)
        | (nonwssep sep?)? body
    = Original definition of moreaftersep
          "-" ows <NUM> (sep body | EOF)
        | WHITE moreaftersep
        | (nonwssep sep?)? body
    

    Putting it altogether we get

    parse --> ows body 
    
    body --> <NAME> (sep body | <EOF>)
    body --> <NUM> afternum
    
    afternum --> "-" ows <NUM> (sep body | <EOF>)
    afternum --> <WHITE> moreafternum
    afternum --> nonwssep (sep)? body
    afternum --> EOF
    
    moreafternum --> "-" ows <NUM> (sep body | EOF)
    moreafternum --> <WHITE> moreafternum
    moreafternum --> ( nonwssep (sep)? )? body
    
    nonwssep --> <EOL> | <COMMA> | <SEMICOLON>
    
    sep --> (nonwssep | <WHITE>)+
    
    ows --> (<WHITE>)*
    

    This is LL(1), so you can translate it to JavaCC with no lookahead.