Search code examples
cgrammarbisonyacclex

Dot(.) disambiguation is Flex/Bison


In my Flex lexer I define two different tokens containing a dot:

  • float numbers: {DIGIT}+\.{DIGIT}+ (first)
  • the dot itself: .

Now, what do I need a single dot token? Because the language I'm writing my grammar for supports .-indexes, like: someObject.someField or someObject.3 (where 3 is an array index).

The problem is when I'm testing it with a bidimensional array for example, like: someArray.0.1.

The grammar reads this as: ID DOT FLOAT while II would obviously want a ID DOT(INTEGER DOT INTEGER) reading.

What is the solution?

Here's the relevant part of the Bison grammar:

keypath                 :   ID DOT ID                                                           
                        |   ID DOT INTEGER                                                      
                        |   ID DOT inline_call                                                  
                        |   inline_call DOT ID                                                  
                        |   inline_call DOT INTEGER                                             
                        |   inline_call DOT inline_call                                         
                        |   keypath[previous] DOT ID                                            
                        |   keypath[previous] DOT INTEGER                                       
                        |   keypath[previous] DOT inline_call 
                        ;

number                  :   INTEGER                                                             
                        |   REAL                                                                
                        ;

Solution

  • Postponing the decision of recognizing floats might help, I made simple example to understand

    Parser specification:

     %{
        #include<stdio.h>
        #include<stdlib.h>
      %}
      %token ID INT DOT
     %%
     Sd   : S'\n' { printf("accepted"); exit(0);}
          | R'\n' { printf("other purpose accepted"); exit(0);}
          ;
     S  : S DOT ID  {} 
        | S DOT INT {} 
        | ID        {}
        ;
     R  : INT DOT INT {printf("real number"); }
        ;
     %%
    
    int main(){
    
    yyparse();
    
    return 0;
    
    }
    

    Lex specification:

     %{
       #include "d.tab.h"
     %}
     %%
     [0-9]+           {  return INT;}
     [a-zA-Z][A-Za-z0-9]* { return ID;}
     "."                { return DOT;}
     .|\n              { return *yytext;}
    %%
    

    Now it recognize s.2.3.