I need to have proper error messages for syntax errors for a grammar I'm writing. I've figured out that I can define a rule (? not sure about the terminology) for newlines in the flex file that increments a line-number counter, and I can use that in yyerror(const char*)
. However, I also need to know the exact position where the error occurred to have better error messages. This is what I would want the error messages to look like:
Syntax error on line X:
SOME ERRONEOUS TEXT ON LINE X
_______________^
Expected other text.
How could I get the column information as well as the text on the erroneous line?
Thank you in advance.
Output Unexpected and Expected Tokens
Just with using
#define YYERROR_VERBOSE 1
yyerror outputs already something like
syntax error, unexpected '+', expecting NUM or '('
Print Line Number
To print the current line number you can make use of yylineno. You need to declare it with
extern int yylineno;
in the .y file.
In the .l flex file you need to add:
%option yylineno
Print Column
To get column information, you must track the columns in the lexer file. So after you have read a token, you can simply add the length of the token (e.g. by using strlen(yytext)). For error reporting, you are interested in the column where the token starts, so you need a second variable that is set and remembers the column position before reading the token.
You could use a simple macro for it:
#define HANDLE_COLUMN column = next_column; next_column += strlen(yytext)
Print Current Input Line
To print the current input line, you must track it yourself. You can read lines from yyin yourself and use this data in the lexer by defining the macro YY_INPUT accordingly. There is this nice answer https://stackoverflow.com/a/43303098 which explains how it works.
The author also shows an example of how the current column can be determined using the macro YY_USER_ACTION.
Simple Example
A simple, self-contained example of a calculator that can handle addition and subtraction could look like this
With an input 5+3+2+1 it gives as output:
5+3+2+1
=11
A erroneous input such as '5+2++1' results as output:
error: syntax error, unexpected '+', expecting NUM or '(' in line 3, column 5
5+2++1
____^
calc.l
%{
#include "y.tab.h"
extern int yylval;
static int next_column = 1;
int column = 1;
#define HANDLE_COLUMN column = next_column; next_column += strlen(yytext)
char *lineptr = NULL;
size_t n = 0;
size_t consumed = 0;
size_t available = 0;
size_t min(size_t a, size_t b);
#define YY_INPUT(buf,result,max_size) {\
if(available <= 0) {\
consumed = 0;\
available = getline(&lineptr, &n, yyin);\
if (available < 0) {\
if (ferror(yyin)) { perror("read error:"); }\
available = 0;\
}\
}\
result = min(available, max_size);\
strncpy(buf, lineptr + consumed, result);\
consumed += result;\
available -= result;\
}
%}
%option noyywrap noinput nounput yylineno
%%
[\t ]+ { HANDLE_COLUMN; }
[0-9]+ { HANDLE_COLUMN; yylval = atoi(yytext); return NUM; }
\n { HANDLE_COLUMN; next_column = 1; return '\n'; }
. { HANDLE_COLUMN; return yytext[0]; }
%%
size_t min(size_t a, size_t b) {
return b < a ? b : a;
}
calc.y
%{
#include <stdio.h>
int yylex(void);
void yyerror(const char *s);
extern int yylineno;
extern int column;
extern char *lineptr;
#define YYERROR_VERBOSE 1
%}
%token NUM
%left '-' '+'
%left '(' ')'
%%
LINE: { $$ = 0; }
| LINE EXPR '\n' { printf("%s=%d\n", lineptr, $2); }
| LINE '\n'
;
EXPR: NUM { $$ = $1; }
| EXPR '-' EXPR { $$ = $1 - $3; }
| EXPR '+' EXPR { $$ = $1 + $3; }
| '(' EXPR ')' { $$ = $2; }
;
%%
void yyerror(const char *str)
{
fprintf(stderr,"error: %s in line %d, column %d\n", str, yylineno, column);
fprintf(stderr,"%s", lineptr);
for(int i = 0; i < column - 1; i++)
fprintf(stderr,"_");
fprintf(stderr,"^\n");
}
int main()
{
yyparse();
free(lineptr);
}
Build Command
Depending on your system, a build command would look similar to the following:
flex calc.l
yacc -d calc.y
cc -Wextra -Wall lex.yy.c y.tab.c