Search code examples
bisonyacclex

Adding functions to a yacc calculator


I'm working on a simple calculator in lex/yacc & I'm trying to have some functions return as doubles, rather than integers.

I initially had everything under expr. Finding out I can't type cast $$ I introduced a new type, DECIMAL and added it as a new grammar.

Now any function call produces a syntax error and terminates the program.

My code:

%{
#define PI 3.14159265358979
#include <stdio.h>
#include <math.h>
int regs[26];
int base;
int yylex();
int yyerror(char *s);
int yywrap();
%}

%start list
%union {
  int a;
  double b;
  char c;
}
%type <a> expr number DIGIT
%type <c> LETTER
%type <b> DECIMAL
%token DIGIT LETTER
%token EXIT
%token SIN COS TAN SQRT LOG LN
%left '|'
%left '&'
%left '+' '-'
%left '*' '/' '%'
%left UMINUS
%right '^'

%nonassoc SIN COS TAN SQRT LOG LN


%%

list: /* empty */
    | list stat '\n'
    | list error '\n' {
      yyerrok;
    };
    | list EXIT {
      exit(EXIT_SUCCESS);
    }
    | DECIMAL ;

stat: expr {
        printf("%d\n", $1);
      }
    | LETTER '=' expr {
      regs[$1] = $3;
    };

expr: '(' expr ')' {
        $$ = $2;
      }
    | expr '*' expr {
        $$ = $1 * $3;
      }
    | expr '/' expr {
        $$ = $1 / $3;
      }
    | expr '%' expr {
        $$ = $1 % $3;
      }
    | expr '+' expr {
        $$ = $1 + $3;
      }
    | expr '-' expr {
        $$ = $1 - $3;
      }
    | expr '&' expr {
        $$ = $1 & $3;
      }
    | expr '|' expr {
        $$ = $1 | $3;
      }
    | '-' expr %prec UMINUS {
        $$ = -$2;
      }
    
    | expr '^' expr{
      $$ = pow($1,$3);
    }

    | LETTER {
        $$ = regs[$1];
      }

    | number;

DECIMAL:
    SIN DECIMAL {
      $$ = sin($2 * PI / 180);
    }

    | COS DECIMAL {
      $$ = cos($2 * PI / 180);
    }

    | TAN DECIMAL {
      $$ = tan($2 * PI / 180);
    }

    | SQRT DECIMAL {
      $$ = sqrt($2);
    }

    | LOG DECIMAL{
      $$ = log10($2);
    }

    | LN DECIMAL{
      $$ = log($2);
    }



number: DIGIT {
          $$ = $1;
          base = ($1 == 0) ? 8 : 10;
        }
      | number DIGIT {
          $$ = base * $1 + $2;
        };

%%

int main() {
  return yyparse();
}

int yyerror(char *s) {
  fprintf(stderr, "%s\n", s);
  return 1;
}

int yywrap() {
  return 1;
}

and if it helps, here's my lex code

%{
#include <stdio.h>
#include "y.tab.h"

int c;
extern YYSTYPE yylval;
%}

%%

" ";


[a-z] {
  c = yytext[0];
  yylval.a = c - 'a';
  return(LETTER);
}

[0-9] {
  c = yytext[0];
  yylval.a = c - '0';
  return(DIGIT);
}

[^a-z0-9\b] {
  c = yytext[0];
  return(c);
}

sin {
  return SIN;
}

cos {
  return COS;
}

tan {
  return TAN;
}

sqrt {
  return SQRT;
}

log {
  return LOG;
}

ln {
  return LN;
}

exit {
  return EXIT;
}

Solution

  • Why is the non-terminal DECIMAL written in all caps? Is it because you couldn't decide whether it was a terminal or a non-terminal?

    Anyway, no derivation can produce a sentential form containing DECIMAL because the only non-terminal which can produce DECIMAL is DECIMAL itself, and furthermore every production for DECIMAL contains another instance of DECIMAL, so it cannot produce a sequence containing only tokens.

    In other words, DECIMAL is both unreachable and non-productive, making it doubly useless. That makes the tokens for the various functions also unreachable. Bison warns you about that. If you try to use the compiled parser despite the warning, you will naturally find that any use of one of those functions is a syntax error; no useful syntax includes the function.

    Every parser symbol must have a single value type, and moreover the datatype of a variable is not syntactic; that is, the parser cannot know whether x, say, contains an integer or a floating point value. (Unless you divide variable names into alphabetic ranges, so that, for example, x is always a double and n is always an integer, but that idea went out of fashion decades ago.)

    So you have some alternatives.

    1. You can make all your values doubles, like JavaScript does. That works because every 32-bit int can be precisely represent a double; in practice, double can precisely represent any integer whose magnitude fits in 52 bits. (But bitwise operations will require some work, and division becomes ambiguous, which is why Python has two different division operators.)

    2. You can make all your values a "discriminated union"; that is, a struct containing an enum which indicates the datatype and a union which contains the data value.

    3. You can build an AST instead of directly evaluating the input. Once you've constructed the AST, you can figure out the datatype of each node with a depth-first walk, and introduce type conversions where necessary. Then you can evaluate the expression (or even compile it) using another depth-first walk. But you'll still need something akin to a discriminated union in order to correctly type the variables.

    Trying to encode type information in the grammar is a dead end. You could try to hack the lexer so that it consulted a symbol table for each variable, but you will find it very difficult to write a conflict-free grammar.

    Of the above solutions, #1 is the simplest and most effective for a calculator, unless you want to be able to use 64-bit integers. #2 is technically superior (IMHO), because it is easily extended to other datatypes like strings, but it's more work. (It's a good learning experience, but it might not be your learning priority right now.)

    If your eventual plan is to build a compiler, not just a calculator, then #3 is the way to go. It's also what you will want if you intend to add conditionals, loops, or user-defined functions. And it will probably give you the clearest idea of what evaluation is really about. But it's certainly even more work.