I am writing a simple Jison grammar in order to get some experience before starting a more complex project. I tried a simple grammar which is a comma separated list of numeric ranges, with ranges where the beginning and ending values were the same to use a single number shorthand. However, when running the generated parser on some test input I get an error which doe snot make alot of sense to me. Here is the grammar i came up with:
/* description: Parses end executes mathematical expressions. */
/* lexical grammar */
%lex
%%
\s+ /* skip whitespace */
[0-9]+ {return 'NUMBER'}
"-" {return '-'}
"," {return ','}
<<EOF>> {return 'EOF'}
. {return 'INVALID'}
/lex
/* operator associations and precedence */
%start ranges
%% /* language grammar */
ranges
: e EOF
{return $1;}
;
e : rng { $$ = $1;}
| e ',' e {alert('e,e');$$ = new Array(); $$.push($1); $$.push($3);}
;
rng
: NUMBER '-' NUMBER
{$$ = new Array(); var rng = {Start:$1, End: $3; }; $$.push(rng); }
| NUMBER
{$$ = new Array(); var rng = {Start:$1, End: $1; }; $$.push(rng);}
;
NUMBER: {$$ = Number(yytext);};
The Test input is this:
5-10,12-16
The output is:
Parse error on line 1:
5-10,12-16
^
Expecting '-', 'EOF', ',', got '8'
If it put an 'a' at the front i get and expected error about finding "INVALID" but i dont have an "8" in the input string so i wondering if this is an internal state?
I am using the online parser generator at: http://zaach.github.io/jison/try/
thoughts?
This production is confusing Jison (and it confused me, too :) ):
NUMBER: {$$ = Number(yytext);};
NUMBER
is supposed to be a terminal, but the above production declares it as a non-terminal with an empty body. Since it can match nothing, it immediately matches, and your grammar doesn't allow two consecutive NUMBER
s. Hence the error.
Also, your grammar is ambiguous, although I suppose Jison's default will solve the issue. It would be better to be explicit, though, since it's easy. Your rule:
e : rng
| e ',' e
does not specify how ,
"associates": in other words, whether rng , rng , rng
should be considered as e , rng
or rng , e
. The first one is probably better for you, so you should write it explicitly:
e : rng
| e ',' rng
One big advantage of the above is that you don't need to create a new array in the second production; you can just push $3
onto the end of $1
and set $$
to $1
.