Search code examples
bisonlexflex-lexerbisonc++

Flex/Flex++ syntax error - "Unrecognized rule"


I'm writing a grammar using Flex++ to generate a parser and this block of code always returns an "unrecognized rule" error.

%{
#include "Parserbase.h"
%}

%option noyywrap

num         [0-9]+
float       [0-9]+"."[0-9]+
comment     [["//"[.]*\n] | ["/\*"[.]*"\*/"]]
varname     [a-zA-Z][a-zA-Z0-9_]*

%%


";"             {return ParserBase::SEMICOLON;}
"\n"            {return ParserBase::ENDLINE;}

"int"           {return ParserBase::INT;}
"="             {return ParserBase::EQUALS;}
{num}           {return ParserBase::NUM;}
{comment}       {return ParserBase::COMMENT;}
{varname}       {return ParserBase::VARNAME;}

This always returns the following :

bisonc++ Compiler.y
[Warning] Terminal symbol(s) not used in productions:
257: NUM
261: ENDLINE
g++ -c parse.cc
flex++ Compiler.l
Compiler.l:21: unrecognised rule
make: *** [lex.yy.cc] Error 1

I've tried moving around the rules, changing the alias to a simple [a-zA-Z] or even just [a-z] All to no avail, and it's driving me mad... Anyone got any ideas? Thanks!


Solution

  • This definition is invalid:

    comment     [["//"[.]*\n] | ["/\*"[.]*"\*/"]]
    

    [ and ( are different. [...] is a character-class; that is, a list of possible characters which will match a single character. (...) is used to group regular expressions.

    Also, I don't believe you can insert arbitrary space characters in a Flex++ regex.

    So I think that what you intended was:

    comment     ("//".*\n|"/*".*"*/")
    

    Here I've removed the incorrect square brackets, changed the ones which were used for grouping into parentheses, and removed the unnecessary grouping around the alternatives, since | has lower precedence than concatenation. I also removed the unnecessary backslash escapes, since quoting is sufficient to make a * into a character.

    However, that will not correctly match C++ comments:

    First, .* is greedy (i.e., it will match the longest possible string) so

    /* A comment */ a = 3; /* Another comment */
    

    will be incorrectly recognized as a single comment.

    Second, . does not match a newline character. So multi-line /* ... */ comments won't match, because .* won't reach to the end of the comment, only to the end of the line.