I'm trying to implement a simple lexer and parser using flex-bison.
All I wanted is parse these :
Just a sequence separated with comma, may or may not contain space. So here is my grammar :
KEY_SET : KEY
{
printf("keyset 1");
}
| KEY COMMA KEY_SET
{
printf("keyset 2");
};
Declared KEY
, COMMA
as token
.//%token
But it gives me Syntax Error, whenever I press enter or any whitespace.
So I even declared IGNORE [ \t\n]
in flex.
And in parser I added a new rule :
IGNORE_BLOCK : IGNORE
{
printf("\n...ignoring...\n")
};
But this doesn't even come to play.
It keeps me giving Syntax Error.
How can I resolve this ?
Lexer :
%{
#include "y.tab.h"
%}
%option noyywrap
COMMA [,]
KEY [[:alpha:][:alnum:]*]
IGNORE [ \t\n]
%%
{COMMA} {return COMMA;}
{KEY} {return KEY;}
{IGNORE} {return IGNORE;}
. {printf("Exiting...\n");exit(0);}
%%
Parser :
%{
#include<stdio.h>
void yyerror (char const *s);
int yywrap();
//int extern yylex();
%}
%token COMMA
%token KEY
%token IGNORE
%%
KEY_SET : KEY
{
printf("keyset 1");
}
| KEY COMMA KEY_SET
{
printf("keyset 2");
};
IGNORE_BLOCK : IGNORE
{
printf("\n...ignoring...\n")
};
%%
int main(int argc, char **argv)
{
while(1)
{
printf("****************\n");
yyparse();
char ign;
scanf("%c",&ign);
}
return 0;
}
int yywrap()
{
return 1;
}
void yyerror (char const *s) {
fprintf (stderr, "%s\n", s);
}
Command I'm using to build :
flex test.l
bison -dy test.y
gcc lex.yy.c y.tab.c -o test.exe
Your flex file contains a series of rules, each consisting of a pattern and an action. Contrary to popular belief, you do not need to "declare" your patterns before using them.
If you want to ignore whitespace in your lexer, you need a rule which does nothing.
You had an error in your key pattern, which I fixed; your pattern would not have accepted keys with more than one letter. Also, it is very bad style to call exit
in your scanner. Let the parser deal with errors.
%{
#include "y.tab.h"
%}
%option noyywrap
%%
/* Removed the COMMA rule. See text below. */
/* "," {return COMMA;} */
/* Compare this pattern with the one you used */
[[:alpha:]][[:alnum:]]* {return KEY;}
/* Recognise and ignore whitespace. */
[[:space:]]+ ; /* Do nothing */
/* Send unrecognised input to the parser. */
. {return *yytext;}
Your parser does not need IGNORE
, which was pointless anyway because the grammar does not produce it. Bison probably warned you about that.
You can simplify your parser in some other ways:
yywrap
is not needed, since your lexer has %option noyywrap
.COMMA
terminal can be written as ','
if you just remove the ","
pattern from the lexer (since the fallback rule
. { return *yytext; }
will work correctly for any single-character literal).For testing, you probably want to parse one line at a time instead of ignoring syntax errors.
I'd also recommend not using the "legacy" flag -y
when you invoke bison; that flag should only be used on old existing yacc grammar files, since it may interfere with modern bison features. Without -y
, bison will write the generated C code to filename.tab.c
and the generated header to filename.tab.h
. If you don't like those names, you can use the -o
flag to specify the name of the generated C code (and the header will have the same name, with the extension changed to .h
).
That might produce something like this:
(Note that I changed KEY_SET
to key_set
because the usual style in grammars is that ALL_CAPS are tokens, while non-terminals are lower-case. I also changed it from right-recursive to left-recursive to avoid a problem you would notice if your production action printed the value of the KEY
token, assuming your lexer had given it a value.)
%{
#include<stdio.h>
void yyerror (char const *s);
int yylex(void);
/* Defined in the flex file */
void set_input(const char* input);
%}
%token KEY
%%
key_set : KEY { printf("keyset 1\n"); }
| key_set ',' KEY { printf("keyset 2\n"); };
%%
int main(int argc, char **argv)
{
char buffer[BUFSIZ];
while (1)
{
printf("****************\n");
char* input = fgets(buffer, sizeof buffer, stdin);
if (buffer == NULL) break;
set_input(input);
yyparse();
}
return 0;
}
void yyerror (char const *s) {
fprintf (stderr, "%s\n", s);
}
%{
#include "parser.tab.h"
%}
%option noinput nounput nodefault yylineno
%option noyywrap
%%
[[:alpha:]][[:alnum:]]* {return KEY;}
[[:space:]]+ ; /* Do nothing */
. {return *yytext;}
%%
static YY_BUFFER_STATE flex_buffer;
void set_input(const char* input) {
yy_delete_buffer(flex_buffer);
flex_buffer = yy_scan_string(input);
}
flex lexer.l
bison -d parser.y
gcc lex.yy.c parser.tab.c -o parser.exe
Your grammar does not allow empty input, but that's fine. For testing purposes, though, you might want to add a test in the loop which reads input lines to only call the parser if the line is not empty.