In my Flex lexer I define two different tokens containing a dot:
{DIGIT}+\.{DIGIT}+
(first).
Now, what do I need a single dot token? Because the language I'm writing my grammar for supports .
-indexes, like: someObject.someField
or someObject.3
(where 3 is an array index).
The problem is when I'm testing it with a bidimensional array for example, like: someArray.0.1
.
The grammar reads this as: ID DOT FLOAT
while II would obviously want a ID DOT(INTEGER DOT INTEGER)
reading.
What is the solution?
Here's the relevant part of the Bison grammar:
keypath : ID DOT ID
| ID DOT INTEGER
| ID DOT inline_call
| inline_call DOT ID
| inline_call DOT INTEGER
| inline_call DOT inline_call
| keypath[previous] DOT ID
| keypath[previous] DOT INTEGER
| keypath[previous] DOT inline_call
;
number : INTEGER
| REAL
;
Postponing the decision of recognizing floats might help, I made simple example to understand
Parser specification:
%{
#include<stdio.h>
#include<stdlib.h>
%}
%token ID INT DOT
%%
Sd : S'\n' { printf("accepted"); exit(0);}
| R'\n' { printf("other purpose accepted"); exit(0);}
;
S : S DOT ID {}
| S DOT INT {}
| ID {}
;
R : INT DOT INT {printf("real number"); }
;
%%
int main(){
yyparse();
return 0;
}
Lex specification:
%{
#include "d.tab.h"
%}
%%
[0-9]+ { return INT;}
[a-zA-Z][A-Za-z0-9]* { return ID;}
"." { return DOT;}
.|\n { return *yytext;}
%%
Now it recognize s.2.3.