I have a flex program written in C++ that needs to complete the following rules:
I want yytext to accept the following:
○ Zero or one of the following characters ABCDEFGH
For example - input:
"triangle ABC" is a valid shape and I want the program to print "Valid shape"
"triangle AAC" is not a valid shape because it contains a double A and I want the program to print nothing in this case
"triangle ABCD" is not a valid shape because it contains four letters and I want the program to print nothing in this case too.
The code below and what regular expressions I tried so far:
%{
/** Methods and Variables initialization **/
%}
corner corner" "[A-H]
line line" "[A-H]{2}
triangle triangle" "[A-H]{3}
square rectangle" "[A-H]{4}
poly pentagon" "[A-H]{5}
hexa hexagon" "[A-H]{6}
hepta heptagon" "[A-H]{7}
octa octagon" "[A-H]{8}
/** Below is the rule section -- yytext is the matched string returned to the program **/
%%
{corner}
{line} |
{triangle} |
{square} |
{poly} |
{hexa} |
{hepta} |
{octa} {
printf("Valid shape: %s", yytext);
}
.
%%
int main() {
yylex();
return 0;
}
// yywrap() - wraps the above rule section
int yywrap(void)
{
return 1;
}
The current input:
triangle AAC
The current output:
Valid shape: triangle AAC (We don't want that)
The current input:
triangle AB
The current output:
Valid shape: triangle ABC
This is not the sort of problem for which you would typically use (f)lex, since the base lexical analysis is trivial (it could be done by simply splitting the line at the space) and detailed error analysis is a bit outside of (f)lex's comfort zone, specifically because there's no way to match "a string containing the same character twice" using a regular expression.
Still, as shown by the question asked by one of your classmates, it can be done with (f)lex by taking advantage of the scanner's ordering rules:
That doesn't get around the question of duplicate characters. The only way to solve that is to enumerate all possibilities, of which there are eight in this case. A simpler way of doing that than that proposed in the linked question is
dups [A-H]*A[A-H]*A|[A-H]*B[A-H]*B|[A-H]*C[A-H]*C|[A-H]D*[A-H]D*...`.
That lets you create an ordered set of rules something like this:
1. Match lines with too many characters
2. Match lines with duplicate characters
3. Match lines with exactly the right number of characters
4. Anything else is an error. (Too few characters, invalid shape name, invalid letter, etc.)
So that might include this (leaving out the definitions of the two macros, which is straightforward but tedious):
/* 1. Dups */
[a-z]+\ {dups}$ { err("Duplicate letter"); }
/* 2. Too long */
{valid}[A-H]+$ { err("Too long"); }
/* 3. Just right */
{valid}$ { printf("Valid: %s\n", yytext); }
/* 4. Anything else */
.+ { err("Too short or invalid character"); }
/* Ignore newlines */
\n ;