The text to be parsed has such examples of commands relating to file system
infile abc*.txt
list abc*ff.txt
where abc*.txt is like the general wildcard argument for shell commands.
However, there is also mathematical expression like:
x=a*b
A common expression rule (in yacc file) is like:
expression:
expression '+' expression { $$ = $1 + $3; }
| expression '-' expression { $$ = $1 - $3; }
| expression '*' expression { $$ = $1 * $3; }
;
The * is used as multiply operator.
And a rule to recognize token IDENTIFIER with * is as:
[A-Za-z][A-Za-z0-9_\.\*]* {
yylval.strval = strdup(yytext); return IDENTIFIER; }
For syntax relating to file system commands like infile or list, as the one at the beginning, the following token will be taken as IDENTIFIER, and might has * as a wildcard to match filenames.
But for an expression like
x = a*b
This should be an expression, but in above lex rule, a*b will be seen as a IDENTIFIER. And it becomes assign value of an identifier a*b to x.
How can I keep the grammar rule of expression and add the wildcard filename in lex or yacc?
In flex this can all be handled by using what are called Start Conditions and are well described in the manual, with examples similar to your requirements.
I made a small example lexer to demonstrate this working:
ws [ \t\n\r]+
%s FILENAME
%%
{ws} ; /* skip */
<<EOF>> ;
<INITIAL>infile BEGIN(FILENAME);
<INITIAL>list BEGIN(FILENAME);
<FILENAME>[A-Za-z][A-Za-z0-9_\.\*]* BEGIN(INITIAL);
"*" return(yytext[0]);
"+" return(yytext[0]);
"-" return(yytext[0]);
"/" return(yytext[0]);
[A-Za-z][A-Za-z0-9_]* return((int)("I"));
. printf("Bad character %c\n",yytext[1]);
Which I can executed in debug mode to show its operation:
C:\Users\Brian>flex -d SOwildcard.l
C:\Users\Brian>gcc -o SOwildcard.exe lex.yy.c -lfl
C:\Users\Brian>SOwildcard
--(end of buffer or a NUL)
a + b
--accepting rule at line 13 ("a")
--accepting rule at line 4 (" ")
--accepting rule at line 10 ("+")
--accepting rule at line 4 (" ")
--accepting rule at line 13 ("b")
--(end of buffer or a NUL)
infile a*.txt
--accepting rule at line 4 ("
")
--accepting rule at line 6 ("infile")
--accepting rule at line 4 (" ")
--accepting rule at line 8 ("a*.txt")
--(end of buffer or a NUL)
variable * identifier
--accepting rule at line 4 ("
")
--accepting rule at line 13 ("variable")
--accepting rule at line 4 (" ")
--accepting rule at line 9 ("*")
--accepting rule at line 4 (" ")
--accepting rule at line 13 ("identifier")
--(end of buffer or a NUL)
list a*.*
--accepting rule at line 4 ("
")
--accepting rule at line 7 ("list")
--accepting rule at line 4 (" ")
--accepting rule at line 8 ("a*.*")
--(end of buffer or a NUL)
--accepting rule at line 4 ("
")
-^C
I know you asked about lex, but I only have flex. It may be similar.