Search code examples
javascriptparsingparser-generatorbnfjison

How do you match zero or more tokens in Jison?


I'm writing a simple expression parser in Jison allowing an arbitrary number of newlines to follow a binary operator in an expression. This is my grammar so far:

{
    "operators": [
        ["left", "+", "-"],
        ["left", "*", "/", "%"]
    ],
    "bnf": {
        "program": [
            ["statement EOF", "return $1;"]
        ],
        "statement": [
            ["expression newlines", "$$ = $1 + ';';"]
        ],
        "expression": [
            ["NUMBER",                           "$$ = yytext;"],
            ["expression + expression",          "$$ = $1 + ' + ' + $3;"],
            ["expression - expression",          "$$ = $1 + ' - ' + $3;"],
            ["expression * expression",          "$$ = $1 + ' * ' + $3;"],
            ["expression / expression",          "$$ = $1 + ' / ' + $3;"],
            ["expression % expression",          "$$ = $1 + ' % ' + $3;"],
            ["expression + newlines expression", "$$ = $1 + ' + ' + $4;"],
            ["expression - newlines expression", "$$ = $1 + ' - ' + $4;"],
            ["expression * newlines expression", "$$ = $1 + ' * ' + $4;"],
            ["expression / newlines expression", "$$ = $1 + ' / ' + $4;"],
            ["expression % newlines expression", "$$ = $1 + ' % ' + $4;"]
        ],
        "newlines": [
            ["NEWLINE",          ""],
            ["newlines NEWLINE", ""]
        ]
    }
}

As you can see I'm writing two rules for every binary operator. That seems to me to be very redundant. I would rather have a production which matches zero or more NEWLINE tokens (Kleene star) instead of one or more tokens (Kleene plus). How would you do this in Jison?


Solution

  • I use Jison and I ignore white-space (including new-lines).

    The first line in my %lex is:

    \s+   /* ignore */
    

    But you don't have to do it that way if you don't want to. Try something along these lines:

    "expression": [
                ["NUMBER",                           "$$ = yytext;"],
                ["expression + expression",          "$$ = $1 + ' + ' + $3;"],
                ["expression - expression",          "$$ = $1 + ' - ' + $3;"],
                ["expression * expression",          "$$ = $1 + ' * ' + $3;"],
                ["expression / expression",          "$$ = $1 + ' / ' + $3;"],
                ["expression % expression",          "$$ = $1 + ' % ' + $3;"],
                ["expression newlines",              "$$ = $1"],
                ["newlines expression",              "$$ = $2"]
            ],
    

    That should allow any amount of new lines before/after any expression.