Search code examples
bisonyacclexflex-lexer

flex/bison - instruction execution delayed


Having issues with flex/bison. I execute one statement and the result appears only after I execute a second statement. Why?

Here's what I want:

d>5
3 = 5
5+6=
11
PR 2+3
5
d>5
3 = 5

Here's what I get (notice the bottom part of the result):

d>5
3 = 5
5+6=
11
PR 2+3

d>5
53 = 5

Here's the flex:

%{
#include "calc.tab.h"
#include <stdlib.h>
%}

%%
[ ]     {}
[0-9]+  { yylval = atoi( yytext ); return NUM; }
[a-z]   { yylval = yytext[0] - 'a'; return NAME; }
.       { return (int) yytext[0]; }
"PR"    { return PR; }
%%

int yywrap(void)
{
  return 1;
}

Here's the yacc/bison:

/* Infix notation calculator--calc */

%{
#define YYDEBUG 1
#define YYSTYPE int
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

/* prototypes */
void yyerror (char* s);
int yylex(void);
void storeVar (char vName, int vValue);
void printVar (int value);
int sym[30];
%}

/* BISON Declarations */

%start input  /* what rule starts */

%token NUM
%token NAME
%token PR

%left '-' '+' /* these done for precdence */
%left '*' '/'
%right '^'   /* exponentiation        */

/* Grammar follows */
%%
input:    /* empty string */
        | input line
;

/* line:     '\n' */
/*        | exp '\n'  { printf ("\t%.10g\n", $1); } */

line:     '\n'              { printf(" ");  }
        | var '>' NUM       { printf("%d %s %d", $1, "=", $3); }
        | PR exp            { printVar($2); }
        | exp '='           { printf("%d", $1); }
;

exp:      NUM               { $$ = $1;          }
        | exp '+' exp       { $$ = $1 + $3;     }
        | exp '-' exp       { $$ = $1 - $3;     }
        | exp '*' exp       { $$ = $1 * $3;     }
        | exp '/' exp       { $$ = $1 / $3;     }
        | exp '^' exp       { $$ = pow ($1, $3); }
        | '(' exp ')'       { $$ = $2;          }
;

var:      NAME              { $$ = $1;          }

%%

int main ()
{
  yyparse ();
}

void yyerror (char *s)  /* Called by yyparse on error */
{
  printf ("%s\n", s);
}

void storeVar (char vName, int vValue)
{
    sym[(int)vName-97] = vValue;
}

void printVar (int value)
{
    printf("%d", value);
    //printf("%d", array[0]);
}

Solution

  • There are two issues with that calculator definition, which in combination are a bit confusing.

    First, the scanner never returns the token \n because no rule matches \n. In flex, the . regex character matches "any character except newline" (see the flex manual). So when the scanner sees a \n, it takes the default action, which is ECHO, and then immediately reads the next token, which causes the next input line to be read. Consequently, when you type PR 2+3NL, the first thing you see is that the NL is echoed, resulting in the blank line, and another line is read. The scanner passes to the parser, in turn, the tokens PRNUM+NAMENAME. Since PRNUM+NAME is a valid line and the following NAME cannot be shifted, the production line: PR exp is now reduced, causing the action { printVar($2); } to be executed.

    And here is the second problem: none of your printf statements print a newline. So that action simply outputs the character 5. Subsequently, the line d>5 is reduced and 3 = 5 is printed. Finally, the scanner is called again, at which point it scans the \n and echoes it.

    It's not generally a good idea to mix output from the scanner and the parser, because the order of execution is not obvious; sometimes the parser will require a lookahead token before reducing, and sometimes it won't, so you cannot easily see whether output from the scanner will come before or after the output from the reduction preceding the scanned token. I don't suppose that was your intention, anyway, but it's still worth saying. On the whole, I would totally avoid the ECHO action, including as a default action, unless you're writing a transducer using only flex.

    Fixing the flex input is easy; you simply need to explicitly add \n to your catchall rule:

    .|\n       { return yytext[0]; }
    

    The cast to int is pointless. If your intention was to safeguard against returning a negative signed character, you should use:

    .|\n       { return (unsigned char)yytext[0]; }
    

    But if you just make that change, you'll find that your output gets somewhat scrambled, because it never outputs newlines. I'd just add \n to every format, but you might have different needs.

    Once you make that change, you'll also find that you can no longer split expressions across lines. You could fix that by ignoring newlines in the scanner, but here you have to make a choice, because if expressions can be split across lines, then it is impossible to know where an expression ends until the next token is read, and you will be back to "delayed" execution.