Search code examples
javascriptjison

Multiplication with juxtaposed terms in Jison?


I've recently been experimenting with Jison, and I thought I would try to create a grammar which is able to (at least partially) parse some math expressions.

However, now I'm confused about how I could go about creating a rule that would allow for multiplication of the form 7a (for example) where a is a previously defined variable. I attempted to do this with adjmul in my code, but the parser does not work unless there is a space present between 7 and a. In short, how would I go about creating a rule / rules that would allow for multiplication between a number and a variable if they are adjacent in the program?

Since I'm kind of new to parsers like Jison in general, I'd also like to know if there is any way to simplify my current rules.

Code:

/* description: Parses math files. */

/* lexical grammar */
%lex
%%
[\n;]                 {return 'NL';}
\s+                   {/* skip whitespace */}
"="                   {return '=';}
[0-9]+("."[0-9]+)?\b  {return 'NUMBER';}
"*"                   {return '*';}
"/"                   {return '/';}
"-"                   {return '-';}
"+"                   {return '+';}
"^"                   {return '^';}
"("                   {return '(';}
")"                   {return ')';}
[a-zA-Z]+             {return 'ID';}
","                   {return ',';}
"|"                   {return '|';}
"!"                   {return '!';}
<<EOF>>               {return 'EOF';}

/lex

/* operator associations and precedence */

%left '|'
%left ','
%left '+' '-'
%left '*' '/'
%left '^'
%left UMINUS
%left '!'

%start program

%% /* language grammar */


program
    : statement 'NL' program
        {$$ = "";}
    | statement EOF
        {$$ = "";}
    ;

statement
    : e
        {$$ = $1;}
    | ID '=' e
        {identifiers[$1] = $3;}
    ;

e
    : block '+' block
        {$$ = $1 + $3;}
    | block '-' block
        {$$ = $1 - $3;}
    | block '*' block
        {$$ = $1 * $3;}
    | block '/' block
        {$$ = $1 / $3;}
    | block '^' block
        {$$ = Math.pow($1, $3);}
    | '-' block %prec UMINUS
        {$$ = -$2;}
    | block '!'
        {$$ = util.factorial($1);}
    | ID '(' csv ')'
        {$$ = identifiers[$1].apply(null, $3);}
    | ID '(' ')'
        {$$ = identifiers[$1]();}
    | ID
        {$$ = identifiers[$1];}
    | adjmul
    | block
    ;

block
    : term
    | NUMBER
        {$$ = Number(yytext);}
    ;

term
    : '(' e ')'
        {$$ = $2;}
    ;

adjmul
    : block term
        {$$ = $1 * $2;}
    | block ID
        {$$ = $1 * identifiers[$2];}
    ;

Solution

  • The regexp for a number is incorrect. It should not have \b at the end. The lexical analysis process should in general be purely about tokenising an input stream of characters into lexical components, with no regard for whether those tokens appear in a valid sequence. What comprises a valid sequence of tokens is the task of the grammar rules. As in your code, whitespace is also usually discarded at the lexical analysis phase unless it has meaning, in which case you'd tokenise that too. So a stream '123 foo' or '123foo' would both produce the token sequence of NUMBER followed by ID, which whitespace or not, would usually be invalid in an expression. So in your case removing the \b may fix the problem, however your grammar would then likely allow '7a' (with no space) and '7 a' with a space. If you did not want to permit the space, I'd be tempted to introduce a new lexical token that was comprised of both a number and word, which your grammar would then treat appropriately. This keeps the concept of whitespace outside of your grammar.