Search code examples
flexboxsyntax-errorbisonyacclex

Syntax problem with Bison and Flex during compilation and execution


I'm currently working on a project that consists in parsing the content of a text file representing a plane ticket using bison and flex. I have created two files, ticket.y and ticket.l, to define grammar rules and corresponding regular expressions.

The example file I want to analyze is the following (ExampleAirplaneTicket.txt):

DOSSIER YBNUKR 
ANTOINE/DESAINT-EXUPERY
22/01/16 OS412  CDG 10:00  VIE 12:00    2:00
22/01/16 OS051  VIE 13:20  NRT +07:25   11:05 
23/01/16 OS8577 NRT 10:00  CHI 09:00    01:45

Here is the content of my billet.l (ticket.l) file:

%{
#include "billet.tab.h"
void yyerror(const char *s);
%}

DIGIT [0-9]
ALPHA [A-Za-z]
SEP [ \t]

%%

"DOSSIER"               { return DOSSIER; }
{ALPHA}{6}              { return CODE_DOSSIER; }
{ALPHA}{3}"/"           { yylval.sval = strdup(yytext); return CODE_AEROPORT; }
{ALPHA}{4}+("/"{ALPHA}+)?("-"{ALPHA}+)?  { yylval.sval = strdup(yytext); return NOM_PRENOM; }
{DIGIT}{2}"/"{DIGIT}{2}"/"{DIGIT}{2} { return DATE; }
{ALPHA}{2}{DIGIT}{2,4}  { return NUM_VOL; }
{ALPHA}{3}              { return CODE_AEROPORT; }
{DIGIT}{2}":"{DIGIT}{2} { yylval.sval = strdup(yytext); return HEURE_OR_DUREE_VOL; }
"+"                     { return PLUS; }
{SEP}+                  { }
\n                      { return NEWLINE; }
.                       { fprintf(stderr, "Caractère non autorisé: '%s'\n", yytext); exit(1); }

%%

And here is the content of my billet.y (ticket.y) file:

%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void yyerror(const char *s);
int yylex();
%}

%union {
  char *sval;
}

%token DOSSIER CODE_DOSSIER NEWLINE PLUS
%token <sval> DATE NUM_VOL CODE_AEROPORT HEURE_OR_DUREE_VOL
%token <sval> NOM_PRENOM

%type <sval> nom_prenom
%type <sval> heure_arrivee
%type <sval> heure_avec_plus

%%

billet: DOSSIER CODE_DOSSIER NEWLINE infos_passager NEWLINE vols;

infos_passager: nom_prenom '/' nom_prenom NEWLINE { printf("Infos passager : %s / %s\n", $1, $3); };

vols: vol NEWLINE vols | vol NEWLINE;

vol: DATE NUM_VOL CODE_AEROPORT HEURE_OR_DUREE_VOL CODE_AEROPORT heure_arrivee HEURE_OR_DUREE_VOL { printf("Vol : %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7); };

heure_arrivee: heure_avec_plus | HEURE_OR_DUREE_VOL;

heure_avec_plus: PLUS HEURE_OR_DUREE_VOL { $$ = $2; };

nom_prenom: NOM_PRENOM;

%%

int main() {
    yyparse();
    return 0;
}

void yyerror(const char *s) {
    fprintf(stderr, "Erreur de syntaxe : %s\n", s);
}

When I compile everything, I can't test my program on the ExampleAirplaneTicket.txt file. I simply have a syntax error, and despite several attempts, I have not been able to solve these problems or even figure out where it comes from.

I am looking for help to understand and solve these problems. If you have any suggestions or advice on how to solve these errors, I would be very grateful.

I tried to implement a parser using flex and bison to parse a specific text format representing airline ticket information. I wrote the .l and .y files and made the necessary adjustments based on the previous problems. Now I expect the program to compile successfully and parse the file ExampleAirlineTicket.txt without any syntax errors or other problems. Except that when I test with the file I just get a syntax error, but no idea where it comes from.

When I compile billet.l I get this warning (I don't think it's a problem):

billet.l:17: warning, the rule can't match
billet.l:20: warning, the rule can't match

No warning when I compile billet.y, and when I compile everything with gcc either.

But when I test with the text file I get this :

Syntax error

UPDATE :

I combined the tokens HEURE and DUREE_VOL into one token : TIME_OR_FLIGHT_TIME . My files above have been updated.

I don't have warnings on line 17 and 20 anymore but I have a warning on line 18 when compiling billet.l.

And still the same syntax error when executing the text file

File billet.l :

I modified line 15 to change the pattern from {ALPHA}+("/"{ALPHA}+)?("-"{ALPHA}+)? to {ALPHA}{4}+("/"{ALPHA}+)?("-"{ALPHA}+)? in order to assume the name has at least 4 letters.
I also changed the token returned on line 15 from NOM_PRENOM to STRING.

file billet.y :

I added a new token <sval> STRING to the list of tokens.
I modified the nom_prenom rule to accept either STRING or STRING / STRING.

I have new errors to compile from billet.y (and no billet.l anymore) :

ticket.y: warning: 1 conflict per offset/reduction [-Wconflicts-sr]
ticket.y: note: run with "-Wcounterexamples" option to generate counterexamples of conflicts

And still a syntax error when I execute my text file ExampleAirplaneTicket.txt


Solution

  • The following warnings from flex:

    billet.l:17: warning, the rule can't match
    billet.l:20: warning, the rule can't match
    

    come from the fact that the rule of:

    • NOM_PRENOM covers what is expected by the rule of CODE_AEROPORT
    • HEURE token is the same pattern as the rule of DUREE_VOL

    So, some of the tokens (CODE_AEROPORT and DUREE_VOL) will never appear. This may be the reason why you get the default "Syntax error" message.

    Note: The C source file generated by bison shows that "Syntax error" is reported when the number of reported tokens (yycount internal variable) is 0:

    /*
    [...]
         - The only way there can be no lookahead present (in yychar) is if
           this state is a consistent state with a default action.  Thus,
           detecting the absence of a lookahead is sufficient to determine
           that there is no unexpected or expected token to report.  In that
           case, just report a simple "syntax error".
    [...]
    */
    [...]
      switch (yycount)
        {
    # define YYCASE_(N, S)                      \
          case N:                               \
            yyformat = S;                       \
          break
          YYCASE_(0, YY_("syntax error"));
          YYCASE_(1, YY_("syntax error, unexpected %s"));
          YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
          YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
          YYCASE_(4, YY_("syntax error, unexpected %s, expecting %s or %s or %s"));
          YYCASE_(5, YY_("syntax error, unexpected %s, expecting %s or %s or %s or %s"));
    # undef YYCASE_
        }
    

    Update

    Update from the latest modifications of the post. There are ambiguities in your lexical analyzer. The work to discriminate the inputs should be done in the grammar. Here is a proposition where the number of tokens in the lexical analyzer is reduced and where the rules in the grammar are more detailed.

    Here is the simplified lexical analyzer (billet.l):

    %{
    #include "billet.tab.h"
    void yyerror(const char *s);
    %}
    
    DIGIT [0-9]
    ALPHA [A-Za-z]
    ALPHA2 [-A-Za-z]
    SEP [ \t]
    
    %%
    
    "DOSSIER"               { return DOSSIER; }
    {ALPHA2}+               { yylval.sval = strdup(yytext); return STRING; }
    {DIGIT}+                { yylval.sval = strdup(yytext); return NUM; }
    {ALPHA}{2}{DIGIT}{2,4}  { yylval.sval = strdup(yytext); return NUM_VOL; }
    "+"                     { return PLUS; }
    {SEP}+                  { }
    \n                      { return NEWLINE; }
    "/"                     { return SLASH; }
    ":"                     { return COLON; }
    .                       { fprintf(stderr, "Caractère non autorisé: '%s'\n", yytext); exit(1); }
    
    %%
    

    And a little more elaborated syntaxic analyzer (billet.y):

    %{
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    void yyerror(const char *s);
    int yylex();
    %}
    
    %union {
      char *sval;
    }
    
    %token DOSSIER CODE_DOSSIER NEWLINE PLUS SLASH COLON
    %token <sval> DATE NUM_VOL STRING NUM
    %token <sval> NOM_PRENOM
    
    %type <sval> duree_vol
    %type <sval> heure
    %type <sval> airport
    %type <sval> nom_prenom
    %type <sval> date
    %type <sval> heure_arrivee
    %type <sval> heure_avec_plus
    %define parse.error verbose
    %%
    
    liste : billet liste | billet
    
    billet: DOSSIER STRING NEWLINE infos_passager NEWLINE vols;
    
    infos_passager: nom_prenom { printf("Infos passager : %s\n", $1); free($1); };
    
    vols: vol NEWLINE vols | vol NEWLINE;
    
    vol: date NUM_VOL airport heure airport heure_arrivee duree_vol { printf("Vol : %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7); free($1); free($2); free($3); free($4); free($5); free($6); free($7); };
    
    duree_vol : heure 
    
    heure : NUM COLON NUM { char str[20]; snprintf(str, sizeof(str), "%s:%s", $1, $3); $$ = strdup(str); free($1); free($3); }
    
    airport : STRING
    
    date: NUM SLASH NUM SLASH NUM { char str[20]; snprintf(str, sizeof(str), "%s/%s/%s", $1, $3, $5); $$ = strdup(str); free($1); free($3); free($5); }
    
    heure_arrivee: heure_avec_plus | heure;
    
    heure_avec_plus: PLUS heure { $$ = $2; };
    
    nom_prenom: STRING | STRING SLASH STRING { char str[120]; snprintf(str, sizeof(str), "%s/%s", $1, $3); $$ = strdup(str); free($1); free($3); };
    
    %%
    
    int main() {
        yyparse();
        return 0;
    }
    
    void yyerror(const char *s) {
        fprintf(stderr, "Erreur de syntaxe : %s\n", s);
    }
    

    Built it:

    $ flex billet.l
    $ bison -d billet.y
    $ gcc billet.tab.c lex.yy.c -lfl
    

    And run it with something like:

    $ ./a.out < input.txt
    Infos passager : ANTOINE/DESAINT-EXUPERY
    Vol : 22/01/16 OS412 CDG 10:00 VIE 12:00 2:00
    Vol : 22/01/16 OS051 VIE 13:20 NRT 07:25 11:05
    Vol : 23/01/16 OS8577 NRT 10:00 CHI 09:00 01:45