Search code examples
clinuxbisonflex-lexer

Why is my bison/flex not working as intended?


I have this homework assignment where I have to transform some input into a particular output. The problem I'm having is that I can only convert the first line into the output I need, the other lines return a "syntax error" error.

Additionally, if I change the lines order, no lines are converted so only one particular line is working.

This is my input file:

Input.txt

B0102 Bobi 2017/01/16 V8 1, massage 12.50
J1841 Jeco 20.2 2017/01/17 V8 2, Tosse 2, tosquia 22.50
B2232 Bobi 2017/01/17 Tosse 1, Leptospirose 1, bath 30.00, massage 12.50
B1841 Jeco 21.4 2017/01/18 Leptospirose 1, Giardiase 2

And this is the output I should obtain:

Output

Bobi (B0102) paid 2 services/vaccines 22.50
Jeco (J1841) paid 3 services/vaccines 62.50
Bobi (B2232) paid 4 services/vaccines 62.50
Jeco (B1841) paid 2 services/vaccines 30.00

If I change the line order in the input file, not even the first line is converted. However, if the order is as I showed above, this is my output:

Bobi (B0102) paid 2 services/vaccines 22.50
syntax error

This is my code:

file.y

%{
    #include "file.h"
    #include <stdio.h>
    int yylex();
    int counter = 0;
    int vaccineCost = 10;
%}

%union{
    char* code;
    char* name;
    float value;
    int quantity;
};

%token COMMA WEIGHT DATE SERVICE VACCINE
%token CODE
%token NAME
%token VALUE
%token QUANTITY

%type <name> NAME
%type <code> CODE
%type <value> VALUE
%type <quantity> QUANTITY
%type <value> services


%start begining

%%

begining: /*empty*/
    | animal
    ;

animal: CODE NAME WEIGHT DATE services {printf("%s (%s) paid %d services/vaccines %.2f\n", $2, $1, counter, $5); counter = 0;}
    | CODE NAME DATE services {printf("%s (%s) paid %d services/vaccines %.2f\n", $2, $1, counter, $4); counter = 0;}
    ;

services: services COMMA SERVICE VALUE {$$ = $1 + $4; counter++;}
    | services COMMA VACCINE QUANTITY{$$ = $1 + $4*vaccineCost;counter++;}
    | SERVICE VALUE{$$ = $2;counter++;}
    | VACCINE VALUE 
{$$ = $2*vaccineCost;counter++;}
    ;

%%

int main(){
    yyparse();
    return 0;
}

void yyerror (char const *s) {
    fprintf (stderr, "%s\n", s);
}

file.flex

%option noyywrap

%{
    #include "file.h"
    #include "file.tab.h"
    #include <stdio.h>
    #include <string.h>
%}

/*Patterns*/
YEAR 20[0-9]{2}
MONTH 0[1-9]|1[0-2]
DAY 0[1-9]|[1-2][0-9]|3[0-1]

%%
,                                   {return COMMA,;}
[A-Z][0-9]{4}            {yylval.code = strdup(yytext); return CODE;}       
[A-Z][a-z]*          {yylval.name = strdup(yytext); return NAME;}
[0-9]+[.][0-9]                             {return WEIGHT;}
{YEAR}"/"{MONTH}"/"{DAY}                           {return DATE;}
(banho|massagem|tosquia)                    {return SERVICE;}
[0-9]+\.[0-9]{2}              {yylval.value = atof(yytext);return VALUE;}
(V8|V10|Anti-Rabatica|Giardiase|Tosse|Leptospirose)          {return VACCINE;}
[1-9]           {yylval.quantity = atoi(yytext);return QUANTITY;}
\n  
.       
<<EOF>> return 0;

%%

And these are the commands I execute:

bison -d file.y
flex -o file.c file.flex
gcc file.tab.c file.c -o exec -lfl
./exec < Input.txt

Can anyone point me in the right direction or tell me what is wrong with my code?

Thanks and if I my explaination wasn't good enough I'll try my best to explain it better!!


Solution

  • There are at least two different problems which cause those symptoms.

    1. Your top-level grammar only accepts at most a single animal:

      inicio: /*vazio*/
          | animal
      

      So an input containing more than one line won't be allowed. You need a top-level which accepts any number of animals. (By the way, modern bison versions let you write %empty as the right-hand side of an empty production, instead of having to (mis)use a comment.

    2. The order of your scanner rules means that most of the words you want to recognise as VACINA will instead be recognised as NOME. Recall that when two patterns match the same token, the first one in the file wlll win. So with these rules:

      [A-Z][a-z]*          {yylval.nome = strdup(yytext); return NOME;}
      (V8|V10|Anti-Rabatica|Giardiase|Tosse|Leptospirose)          {return VACINA;}
      

      Tokens like Tosse, which could match either rule, will be assumed to match the first rule. Only V8 and Anti-Rabatical, which [A-Z][a-z]* doesn't match, will fall through to the second rule. So your first input line doesn't trigger this problem, but all the other ones do.

    You probably should handle newline characters syntactically, unless you allow treatment records to be split over multiple lines. And be aware that many (f)lex versions do not allow empty actions, as in your last two flex rules. This may cause lexical errors.

    And finally

    <<EOF>> return 0;
    

    is unnecessary. That's how the scanner handles end-of-fike by default. <<EOF>> rules are often wring or redundant, and should only be used when clearly needed (and with great care).