I am writting a parser and a scanner in Ubuntu OS. In my flex code "scanner.l" I have an IDENTIFIER token and BOOL_LITERAL token. IDENTIFIER is any word and BOOL_LITERAL is either true or false. In my bison code "parser.y" I have the grammar in which it should be able to take a BOO_LITERAL through the primary production.
However, the code is not working as intended. Here is the erro
Here are all of my files:
scanner.l
%{
#include <string>
#include <vector>
using namespace std;
#include "listing.h"
#include "tokens.h"
%}
%option noyywrap
ws [ \t\r]+
comment (\-\-.*\n)|\/\/.*\n
line [\n]
digit [0-9]
int {digit}+
real {int}"."{int}([eE][+-]?{digit})?
boolean ["true""false"]
punc [\(\),:;]
addop ["+""-"]
mulop ["*""\/"]
relop [="/=">">=""<="<]
id [A-Za-z][A-Za-z0-9]*
%%
{ws} { ECHO; }
{comment} { ECHO; nextLine();}
{line} { ECHO; nextLine();}
{relop} { ECHO; return(RELOP); }
{addop} { ECHO; return(ADDOP); }
{mulop} { ECHO; return(MULOP); }
begin { ECHO; return(BEGIN_); }
boolean { ECHO; return(BOOLEAN); }
end { ECHO; return(END); }
endreduce { ECHO; return(ENDREDUCE); }
function { ECHO; return(FUNCTION); }
integer { ECHO; return(INTEGER); }
real { ECHO; return(REAL); }
is { ECHO; return(IS); }
reduce { ECHO; return (REDUCE); }
returns { ECHO; return(RETURNS); }
and { ECHO; return(ANDOP); }
{boolean} { ECHO; return(BOOL_LITERAL); }
{id} { ECHO; return(IDENTIFIER);}
{int} { ECHO; return(INT_LITERAL); }
{real} { ECHO; return(REAL_LITERAL); }
{punc} { ECHO; return(yytext[0]); }
. { ECHO; appendError(LEXICAL, yytext); }
%%
parser.y
%{
#include <string>
using namespace std;
#include "listing.h"
int yylex();
void yyerror(const char* message);
%}
%error-verbose
%token INT_LITERAL REAL_LITERAL BOOL_LITERAL
%token IDENTIFIER
%token ADDOP MULOP RELOP ANDOP
%token BEGIN_ BOOLEAN END ENDREDUCE FUNCTION INTEGER IS REDUCE RETURNS REAL
%%
function:
function_header optional_variable body ;
function_header:
FUNCTION IDENTIFIER RETURNS type ';' ;
parameters:
parameters ',' |
parameter ;
parameter:
IDENTIFIER ':' type |
;
optional_variable:
variable |
;
variable:
IDENTIFIER ':' type IS statement_ ;
type:
INTEGER |
BOOLEAN |
REAL ;
body:
BEGIN_ statement_ END ';' ;
statement_:
statement ';' |
error ';' ;
statement:
expression |
REDUCE operator reductions ENDREDUCE ;
operator:
ADDOP |
MULOP ;
reductions:
reductions statement_ |
;
expression:
expression ANDOP relation |
relation ;
relation:
relation RELOP term |
term;
term:
term ADDOP factor |
factor ;
factor:
factor MULOP primary |
primary ;
primary:
'(' expression ')' |
INT_LITERAL |
REAL_LITERAL |
BOOL_LITERAL |
IDENTIFIER ;
%%
void yyerror(const char* message)
{
appendError(SYNTAX, message);
}
int main(int argc, char *argv[])
{
firstLine();
yyparse();
lastLine();
return 0;
}
Other associated files:
listing.h
enum ErrorCategories {LEXICAL, SYNTAX, GENERAL_SEMANTIC, DUPLICATE_IDENTIFIER,
UNDECLARED};
void firstLine();
void nextLine();
int lastLine();
void appendError(ErrorCategories errorCategory, string message);
listing.cc
#include <cstdio>
#include <string>
using namespace std;
#include "listing.h"
static int lineNumber;
static string error = "";
static int totalErrors = 0;
static void displayErrors();
void firstLine()
{
lineNumber = 1;
printf("\n%4d ",lineNumber);
}
void nextLine()
{
displayErrors();
lineNumber++;
printf("%4d ",lineNumber);
}
int lastLine()
{
printf("\r");
displayErrors();
printf(" \n");
return totalErrors;
}
void appendError(ErrorCategories errorCategory, string message)
{
string messages[] = { "Lexical Error, Invalid Character ", "",
"Semantic Error, ", "Semantic Error, Duplicate Identifier: ",
"Semantic Error, Undeclared " };
error = messages[errorCategory] + message;
totalErrors++;
}
void displayErrors()
{
if (error != "")
printf("%s\n", error.c_str());
error = "";
}
makeile
compile: scanner.o parser.o listing.o
g++ -o compile scanner.o parser.o listing.o
scanner.o: scanner.c listing.h tokens.h
g++ -c scanner.c
scanner.c: scanner.l
flex scanner.l
mv lex.yy.c scanner.c
parser.o: parser.c listing.h
g++ -c parser.c
parser.c tokens.h: parser.y
bison -d -v parser.y
mv parser.tab.c parser.c
mv parser.tab.h tokens.h
listing.o: listing.cc listing.h
g++ -c listing.cc
Note: I have to run "makeile", "bison -d parser.y" and finally "makefile" again. Then, I run the following command "./compile < incremental1.txt" and I get the following error: enter image description here
Please help me understand why I am getting a syntax error.
@SoronelHaetir has certainly identified one of the problems with your parser. But that problem cannot create the syntax error message which appears in your image. [Note 1] Your grammar allows identifiers in exactly the same place as boolean literals, so the fact that true
is actually scanned as an identifier will not produce a syntax error in an expression which starts true and
. (In other words, x and...
would be parsed just the same.)
The problem is actually your use of 8.E+1
as a numeric literal. Your rule for REAL_LITERAL
uses the pattern
{int}"."{int}([eE][+-]?{digit})?
which doesn't match 8.E+1
because there is no {int}
followed the .
. So when the scanner reaches the input 8.E+1
, it produces the INT_LITERAL
8
, which is the longest match. When it is asked for the next token, it first sees a .
, but that doesn't match any pattern so it uses the default fallback action (ECHO
), and then continues to the next character (E
) which matches the IDENTIFIER
pattern. And the input
true and 8 E ...
is indeed a syntax error: there is an unexpected identifier following the 8, and that's what bison reports.
Aside from fixing the pattern for real literals, you should make sure that you do something sensible with unrecognised characters; flex's default action -- which basically just ignores characters that can't match any pattern -- is not of much use, particularly in debugging (as I think the above explanation demonstrates).
There are a number of other issues with your patterns involving the same misconception about the syntax of character classes as shown in the boolean literal pattern. This indicates to me that you did not attempt to test your lexical scanner before hooking it into your parser. That's an essential step in writing parsers; if your lexical scanner is not returning the tokens you expect it to return, you're going to have a lot of trouble trying to figure out what errors there might be in your grammar.
You might find the debugging techniques outlined in this answer useful. (That post also has links to the flex and bison manuals. Section 6 of the flex manual is a brief but complete guide to the syntax of flex patterns, and you might want to take a few minutes to read it.)