Search code examples
programming-languagesgrammarbisonshift-reduce-conflictimperative-languages

Bison ID reduction conflict


I'm working on a simple programming language, writing it from scratch, and using Flex and Bison. This language recognizes simple arithmetic expressions and imperative variable statements.

I'm having a problem in solving a reduction conflict for IDs (i.e., variable names). In this language, IDs can either be expressions, conditionals, function names, or even a type (conditionals and functions will only be implemented in the future). However, I'm having difficulty trying to translate this into the grammar due to reduction conflicts, since an ID can be reduced by multiple productions:

  • rval -> exp -> ID
  • rval -> cond -> ID
  • rval -> type -> ID

How can I restructure my grammar so that IDs can be accepted in any of these cases and have the validity of the ID types be checked later on by the type system? How do "real" programming languages deal with these kinds of problems?

Here is a simplified version of my Bison grammar (I omitted the rules' code for brevity):

stmt: rval                  
   | type ID               
   | type ID '=' rval      
   | ID '=' rval           
   ;

rval: exp                 
   | cond              
   | type                
   ;

type: TYPE                
   | ID                   

cond: BOOL_LITERAL         
   | ID                     
   ;

exp: INT_LITERAL          
   | ID                    
   | exp '+' exp           
   | exp '-' exp          
   | exp '*' exp         
   | exp '/' exp          
   | exp '%' exp          
   | '(' exp ')'            

Example of a valid program in this language:

type T = int; // Similar to a "typedef int T" in C
T n = 100;
int m = n / 10;
type B = bool;
B a = true;

Solution

  • I found out that these kinds of verifications are very tricky and sometimes even impossible to compute at the parsing level. The solution is to enforce type safety by performing type checking in a later stage/step, and considering both integer expressions and boolean expressions as valid syntax for the 'exp' production. Then, either while constructing the AST nodes or later when evaluating/compiling, determine the type of each node based on the type of its children and check the validity of the types involved. Example: the expression '1 + true' would be parsed as correct (according to syntax rules), but then after invoking your type system, it would raise an error.