i want my code to print the number of the syntax errors that occur in an input file. Here's my code:
%{
#include <stdio.h>
#include <math.h>
void yyerror(char *);
extern int yylval;
extern FILE *yyin;
extern FILE *yyout;
extern yylineno;
extern int yyparse(void);
extern int yylex(void);
extern int yywrap() { return 1; }
extern char* yytext;
int errors;
%}
%debug
%start m_class
%token IF ELSE INT CHAR CLASS NEW GURISE VOID WHILE
%token PUBLIC PROTECTED PRIVATE STATIC FINAL ABSTRACT
%token PLUS MINUS MUL DIV MODULO
%token EQ NEQ GRT LT GREQ LEQ
%token OR AND NOT
%token AR_PAR DEK_PAR AR_AGK DEK_AGK AR_STRO DEK_STRO
%token SEMICOLON ANATHESI COMA
%token MY_INT SINT MY_CHAR ID
%right ANATHESI
%left OR AND
%nonassoc EQ NEQ GRT LT GREQ LEQ
%left PLUS MINUS MUL DIV MODULO
%right NOT
%right "then" ELSE
%%
m_class: m_class class_declaration
| class_declaration
| error "\n" {yyerrok; errors++; yyclearin;}
;
class_declaration: CLASS ID class_body
| error "\n" {yyerrok; errors++; yyclearin;}
;
class_body: AR_STRO variable_declaration constructor method_declaration DEK_STRO
| error "\n" {yyerrok; errors++; yyclearin;}
;
variable_declaration:variable variable_declaration
|variable
|array_declaration
|array_declaration variable_declaration
| error "\n" {yyerrok; errors++; yyclearin;}
;
variable: var_type ID SEMICOLON
| error "\n" {yyerrok; errors++; yyclearin;}
;
var_type: INT
|CHAR
| error "\n" {yyerrok; errors++; yyclearin;}
;
array_declaration: ID ANATHESI NEW var_type AR_AGK MY_INT DEK_AGK SEMICOLON
| error "\n" {yyerrok; errors++; yyclearin;}
;
constructor: modifier ID AR_STRO variable_declaration DEK_STRO
| error "\n" {yyerrok; errors++; yyclearin;}
;
modifier: PUBLIC
| PROTECTED
| PRIVATE
| STATIC
| FINAL
| ABSTRACT
| error "\n" {yyerrok; errors++; yyclearin;}
;
method_declaration: modifier meth_type ID parameters meth_body
| error "\n" {yyerrok; errors++; yyclearin;}
;
meth_type: VOID
| var_type
| error "\n" {yyerrok; errors++; yyclearin;}
;
parameters: AR_PAR par_body DEK_PAR
| error "\n" {yyerrok; errors++; yyclearin;}
;
par_body: var_type ID
| par_body COMA var_type ID
| error "\n" {yyerrok; errors++; yyclearin;}
;
meth_body: AR_STRO bodybuilder DEK_STRO
| error "\n" {yyerrok; errors++; yyclearin;}
;
bodybuilder: statement GURISE expression SEMICOLON
|statement bodybuilder
|statement
| error "\n" {yyerrok; errors++; yyclearin;}
;
statement: anathesh
| if_statement
| while_statement
| error "\n" {yyerrok; errors++; yyclearin;}
;
anathesh:atath SEMICOLON
| atath numeric_expression SEMICOLON
| error "\n" {yyerrok; errors++; yyclearin;}
;
atath: ID ANATHESI orisma
|ID AR_AGK MY_INT DEK_AGK ANATHESI orisma
| error "\n" {yyerrok; errors++; yyclearin;}
;
orisma: ID
|MY_INT
|SINT
|MY_CHAR
| error "\n" {yyerrok; errors++; yyclearin;}
;
expression: testing_expression
| numeric_expression
| logical_expression
| ID
| MY_INT
| SINT
| MY_CHAR
| error "\n" {yyerrok; errors++; yyclearin;}
;
numeric_expression: expression PLUS expression
| expression MINUS expression
| expression MUL expression
| expression DIV expression
| expression MODULO expression
| error "\n" {yyerrok; errors++; yyclearin;}
;
testing_expression: expression EQ expression
| expression NEQ expression
| expression GRT expression
| expression LT expression
| expression GREQ expression
| expression LEQ expression
| error "\n" {yyerrok; errors++; yyclearin;}
;
logical_expression: expression OR expression
| expression AND expression
| expression NOT expression
| error "\n" {yyerrok; errors++; yyclearin;}
;
if_statement: IF sin8iki statement %prec "then"
| IF sin8iki statement ELSE statement
| error "\n" {yyerrok; errors++; yyclearin;}
;
sin8iki: AR_PAR testing_expression DEK_PAR
| AR_PAR logical_expression DEK_PAR
| error "\n" {yyerrok; errors++; yyclearin;}
;
while_statement: WHILE sin8iki statement
| error "\n" {yyerrok; errors++; yyclearin;}
;
%%
void yyerror(char *s) {
errors++;
printf("\n------- ERROR AT LINE #%d.\n\n", yylineno);
fprintf(stderr, "%d: error: '%s' at '%s', yylval=%u\n", yylineno, s, yytext, yylval);
}
int main (int argc, char **argv) {
++argv;
--argc;
errors=0;
if (argc > 0)
yyin = fopen (argv[0], "r");
else
yyin = stdin;
yyout = fopen ("output","w");
yyparse();
if(errors==0)
printf("komple");
else
printf("la8oi: %d", errors);
return 0;
}
i tried to modify yyerrok but it seems i can't. i tried also to put yyparse in a for-loop. In the input file i have 5 syntax errors but it prints only 1!!!!!! any ideas??????
The production:
error "\n" {yyerrok; errors++; yyclearin;}
probably doesn't do what you expect.
error
productions are not particularly different from normal productions; the main difference is that the error
production synchronizes with the following token (normally a terminal). [1] For bison, double-quoted strings ("foo"
) are valid terminals, but there is no easy way to get the corresponding token number, which makes life difficult for the lexer. [2] That's different from single-quoted strings ('a'
), which must be a single character, and which represent the token whose number is the integer corresponding to the single character. That's similar to the difference in C between single and double-quoted strings.
So your error
productions will try to synchronize with the token "'\n'", whose token number is generated automatically by bison
. But it's unlikely that your lexer ever produces this token number, first because it doesn't know what the number is, and second because I suspect your lexer ignores whitespace. [3] Without seeing the lexer, it's hard to tell, but those seem like reasonable assumptions.
Consequently, the first error
production will discard tokens until it reaches the end of file, at which point it will fail and the parse will terminate, reporting one error.
Notes.
More accurately, it ensures that the lookahead token could be shifted. So it discards tokens until it finds one which might be the first terminal in the string produced by the rest of the error production. That doesn't guarantee that the production is reducible, but it's probably the best bison can do.
You can declare a token name and a token literal by putting them (in that order) in a %token
declaration (or any of the precedence declarations). I think that's what you intended to do with "then"
, but the way you wrote it won't work; you need:
%right THEN "then" ELSE
(or, better)
%right THEN "then" ELSE "else"
which would declare that the token number THEN
(which is an integer constant generated by bison) can also be written as "then"
in the grammar. Obviously, it doesn't magically alter the lexer so that it automatically recognizes the string; that's still your responsibility. The main advantages to declaring tokens in this form is that it makes your grammar more readable, and that bison can produce better error messages if you enable them with %error-verbose
.
If your lexer return newline tokens, I'd expect them to be returned as token '\n'
(which is 10), but that's hardly ever a good idea, since the grammar would then have to be written in a way which explicitly allowed newlines. Some languages do this (Python, for example), but for a language in which newlines can appear just about anywhere, it makes the grammar really complicated.