Search code examples
cbisonflex-lexer

Why Bison just prints the input?


Bison always prints the input instead of running the action.

I begin with Bison and I try to make it working with the simpler rule as possible.

Lexer


%{
    #include <stdio.h>
    #include "wip.tab.h"
%}

%%

[\t\n ]+         ;
[a−z]+  { yylval.sval = strdup(yytext); return IDENTIFIER;}

%%

Parser


%{
    #include <stdio.h>
    #include <stdlib.h>
    int yylex(void);
    void yyerror(char const *);
    FILE *yyin;
%}

%union{
    char *sval;
}

%token IDENTIFIER

%%
input:
    %empty
    | input line
    ;

line:
    '\n'
    | IDENTIFIER {printf("OK\n");}
    ;
%%

int main(void) {
    FILE *myfile = fopen("example.wip", "r");
    if (!myfile) {
        printf("File can't be opened\n");
        return -1;    
    }
    yyin = myfile;
    yyparse();   
}

void yyerror(char const *s) {
    fprintf(stderr, "%s\n", s);
}

The "example.wip" input file

hello

I expect the "OK" output in my terminal but the parser just prints the content of the file. Thanks in advance.


Solution

  • Bison always prints the input instead of running the action.

    Bison-generated never print the input unless that's what the actions say. Since none of your actions print anything other than "OK", that can't be what's going on here.

    However, by default flex-generated lexers do print the input when they see a character that they don't recognize. To verify that this is what's going on, we can add a rule at the end of your lexer file that prints a proper error message for unrecognized characters:

    .      { fprintf(stderr, "Unrecognized character: '%c'\n", yytext[0]); }
    

    And sure enough, this will tell us that all the characters in "hello" are unrecognized.

    So what's wrong with the [a−z]+ pattern? Why doesn't it match "hello"? What's wrong is the . It's not a regular ASCII dash, but a Unicode dash that has no special meaning to flex. So flex interprets [a−z] as a character class that can match one of three characters: a, the Unicode dash or z - not as a range from a to z.

    To fix this, just replace it with a regular dash.