Search code examples
parsingwildcardlex

How to describe a lex or yacc rule to recognize wildcard identifier?


The text to be parsed has such examples of commands relating to file system

infile abc*.txt
list abc*ff.txt

where abc*.txt is like the general wildcard argument for shell commands.

However, there is also mathematical expression like:

x=a*b

A common expression rule (in yacc file) is like:

expression: 
    expression '+' expression { $$ = $1 + $3;  }
    |   expression '-' expression { $$ = $1 - $3; }
    |   expression '*' expression { $$ = $1 * $3; }
    ;

The * is used as multiply operator.

And a rule to recognize token IDENTIFIER with * is as:

[A-Za-z][A-Za-z0-9_\.\*]*   {
    yylval.strval = strdup(yytext);  return IDENTIFIER; }

For syntax relating to file system commands like infile or list, as the one at the beginning, the following token will be taken as IDENTIFIER, and might has * as a wildcard to match filenames.

But for an expression like

x = a*b

This should be an expression, but in above lex rule, a*b will be seen as a IDENTIFIER. And it becomes assign value of an identifier a*b to x.

How can I keep the grammar rule of expression and add the wildcard filename in lex or yacc?


Solution

  • In flex this can all be handled by using what are called Start Conditions and are well described in the manual, with examples similar to your requirements.

    I made a small example lexer to demonstrate this working:

    ws [ \t\n\r]+
    %s FILENAME
    %%
    {ws}    ; /* skip */
    <<EOF>>    ;
    <INITIAL>infile      BEGIN(FILENAME); 
    <INITIAL>list         BEGIN(FILENAME); 
    <FILENAME>[A-Za-z][A-Za-z0-9_\.\*]*     BEGIN(INITIAL);  
    "*"               return(yytext[0]);
    "+"               return(yytext[0]);
    "-"               return(yytext[0]);
    "/"               return(yytext[0]);
    [A-Za-z][A-Za-z0-9_]*              return((int)("I"));
    .                 printf("Bad character %c\n",yytext[1]);
    

    Which I can executed in debug mode to show its operation:

    C:\Users\Brian>flex -d  SOwildcard.l    
    C:\Users\Brian>gcc -o SOwildcard.exe lex.yy.c -lfl    
    C:\Users\Brian>SOwildcard
    --(end of buffer or a NUL)
    a + b
    --accepting rule at line 13 ("a")
    --accepting rule at line 4 (" ")
    --accepting rule at line 10 ("+")
    --accepting rule at line 4 (" ")
    --accepting rule at line 13 ("b")
    --(end of buffer or a NUL)
    infile a*.txt
    --accepting rule at line 4 ("
    ")
    --accepting rule at line 6 ("infile")
    --accepting rule at line 4 (" ")
    --accepting rule at line 8 ("a*.txt")
    --(end of buffer or a NUL)
    variable * identifier
    --accepting rule at line 4 ("
    ")
    --accepting rule at line 13 ("variable")
    --accepting rule at line 4 (" ")
    --accepting rule at line 9 ("*")
    --accepting rule at line 4 (" ")
    --accepting rule at line 13 ("identifier")
    --(end of buffer or a NUL)  
    list a*.*
    --accepting rule at line 4 ("
    ")
    --accepting rule at line 7 ("list")
    --accepting rule at line 4 (" ")
    --accepting rule at line 8 ("a*.*")
    --(end of buffer or a NUL)
    --accepting rule at line 4 ("
    ")
    -^C
    

    I know you asked about lex, but I only have flex. It may be similar.