Syntax problem with Bison and Flex during compilation and execution

I'm currently working on a project that consists in parsing the content of a text file representing a plane ticket using bison and flex. I have created two files, ticket.y and ticket.l, to define grammar rules and corresponding regular expressions.

The example file I want to analyze is the following (ExampleAirplaneTicket.txt):

DOSSIER YBNUKR 
ANTOINE/DESAINT-EXUPERY
22/01/16 OS412  CDG 10:00  VIE 12:00    2:00
22/01/16 OS051  VIE 13:20  NRT +07:25   11:05 
23/01/16 OS8577 NRT 10:00  CHI 09:00    01:45

Here is the content of my billet.l (ticket.l) file:

%{
#include "billet.tab.h"
void yyerror(const char *s);
%}

DIGIT [0-9]
ALPHA [A-Za-z]
SEP [ \t]

%%

"DOSSIER"               { return DOSSIER; }
{ALPHA}{6}              { return CODE_DOSSIER; }
{ALPHA}{3}"/"           { yylval.sval = strdup(yytext); return CODE_AEROPORT; }
{ALPHA}{4}+("/"{ALPHA}+)?("-"{ALPHA}+)?  { yylval.sval = strdup(yytext); return NOM_PRENOM; }
{DIGIT}{2}"/"{DIGIT}{2}"/"{DIGIT}{2} { return DATE; }
{ALPHA}{2}{DIGIT}{2,4}  { return NUM_VOL; }
{ALPHA}{3}              { return CODE_AEROPORT; }
{DIGIT}{2}":"{DIGIT}{2} { yylval.sval = strdup(yytext); return HEURE_OR_DUREE_VOL; }
"+"                     { return PLUS; }
{SEP}+                  { }
\n                      { return NEWLINE; }
.                       { fprintf(stderr, "Caractère non autorisé: '%s'\n", yytext); exit(1); }

%%

And here is the content of my billet.y (ticket.y) file:

%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void yyerror(const char *s);
int yylex();
%}

%union {
  char *sval;
}

%token DOSSIER CODE_DOSSIER NEWLINE PLUS
%token <sval> DATE NUM_VOL CODE_AEROPORT HEURE_OR_DUREE_VOL
%token <sval> NOM_PRENOM

%type <sval> nom_prenom
%type <sval> heure_arrivee
%type <sval> heure_avec_plus

%%

billet: DOSSIER CODE_DOSSIER NEWLINE infos_passager NEWLINE vols;

infos_passager: nom_prenom '/' nom_prenom NEWLINE { printf("Infos passager : %s / %s\n", $1, $3); };

vols: vol NEWLINE vols | vol NEWLINE;

vol: DATE NUM_VOL CODE_AEROPORT HEURE_OR_DUREE_VOL CODE_AEROPORT heure_arrivee HEURE_OR_DUREE_VOL { printf("Vol : %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7); };

heure_arrivee: heure_avec_plus | HEURE_OR_DUREE_VOL;

heure_avec_plus: PLUS HEURE_OR_DUREE_VOL { $$ = $2; };

nom_prenom: NOM_PRENOM;

%%

int main() {
    yyparse();
    return 0;
}

void yyerror(const char *s) {
    fprintf(stderr, "Erreur de syntaxe : %s\n", s);
}

When I compile everything, I can't test my program on the ExampleAirplaneTicket.txt file. I simply have a syntax error, and despite several attempts, I have not been able to solve these problems or even figure out where it comes from.

I am looking for help to understand and solve these problems. If you have any suggestions or advice on how to solve these errors, I would be very grateful.

I tried to implement a parser using flex and bison to parse a specific text format representing airline ticket information. I wrote the .l and .y files and made the necessary adjustments based on the previous problems. Now I expect the program to compile successfully and parse the file ExampleAirlineTicket.txt without any syntax errors or other problems. Except that when I test with the file I just get a syntax error, but no idea where it comes from.

When I compile billet.l I get this warning (I don't think it's a problem):

billet.l:17: warning, the rule can't match
billet.l:20: warning, the rule can't match

No warning when I compile billet.y, and when I compile everything with gcc either.

But when I test with the text file I get this :

Syntax error

UPDATE :

I combined the tokens HEURE and DUREE_VOL into one token : TIME_OR_FLIGHT_TIME . My files above have been updated.

I don't have warnings on line 17 and 20 anymore but I have a warning on line 18 when compiling billet.l.

And still the same syntax error when executing the text file

File billet.l :

I modified line 15 to change the pattern from {ALPHA}+("/"{ALPHA}+)?("-"{ALPHA}+)? to {ALPHA}{4}+("/"{ALPHA}+)?("-"{ALPHA}+)? in order to assume the name has at least 4 letters.
I also changed the token returned on line 15 from NOM_PRENOM to STRING.

file billet.y :

I added a new token <sval> STRING to the list of tokens.
I modified the nom_prenom rule to accept either STRING or STRING / STRING.

I have new errors to compile from billet.y (and no billet.l anymore) :

ticket.y: warning: 1 conflict per offset/reduction [-Wconflicts-sr]
ticket.y: note: run with "-Wcounterexamples" option to generate counterexamples of conflicts

And still a syntax error when I execute my text file ExampleAirplaneTicket.txt

Solution

The following warnings from flex:

billet.l:17: warning, the rule can't match
billet.l:20: warning, the rule can't match

come from the fact that the rule of:

NOM_PRENOM covers what is expected by the rule of CODE_AEROPORT
HEURE token is the same pattern as the rule of DUREE_VOL

So, some of the tokens (CODE_AEROPORT and DUREE_VOL) will never appear. This may be the reason why you get the default "Syntax error" message.

Note: The C source file generated by bison shows that "Syntax error" is reported when the number of reported tokens (yycount internal variable) is 0:

/*
[...]
     - The only way there can be no lookahead present (in yychar) is if
       this state is a consistent state with a default action.  Thus,
       detecting the absence of a lookahead is sufficient to determine
       that there is no unexpected or expected token to report.  In that
       case, just report a simple "syntax error".
[...]
*/
[...]
  switch (yycount)
    {
# define YYCASE_(N, S)                      \
      case N:                               \
        yyformat = S;                       \
      break
      YYCASE_(0, YY_("syntax error"));
      YYCASE_(1, YY_("syntax error, unexpected %s"));
      YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
      YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
      YYCASE_(4, YY_("syntax error, unexpected %s, expecting %s or %s or %s"));
      YYCASE_(5, YY_("syntax error, unexpected %s, expecting %s or %s or %s or %s"));
# undef YYCASE_
    }

Update

Update from the latest modifications of the post. There are ambiguities in your lexical analyzer. The work to discriminate the inputs should be done in the grammar. Here is a proposition where the number of tokens in the lexical analyzer is reduced and where the rules in the grammar are more detailed.

Here is the simplified lexical analyzer (billet.l):

%{
#include "billet.tab.h"
void yyerror(const char *s);
%}

DIGIT [0-9]
ALPHA [A-Za-z]
ALPHA2 [-A-Za-z]
SEP [ \t]

%%

"DOSSIER"               { return DOSSIER; }
{ALPHA2}+               { yylval.sval = strdup(yytext); return STRING; }
{DIGIT}+                { yylval.sval = strdup(yytext); return NUM; }
{ALPHA}{2}{DIGIT}{2,4}  { yylval.sval = strdup(yytext); return NUM_VOL; }
"+"                     { return PLUS; }
{SEP}+                  { }
\n                      { return NEWLINE; }
"/"                     { return SLASH; }
":"                     { return COLON; }
.                       { fprintf(stderr, "Caractère non autorisé: '%s'\n", yytext); exit(1); }

%%

And a little more elaborated syntaxic analyzer (billet.y):

%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void yyerror(const char *s);
int yylex();
%}

%union {
  char *sval;
}

%token DOSSIER CODE_DOSSIER NEWLINE PLUS SLASH COLON
%token <sval> DATE NUM_VOL STRING NUM
%token <sval> NOM_PRENOM

%type <sval> duree_vol
%type <sval> heure
%type <sval> airport
%type <sval> nom_prenom
%type <sval> date
%type <sval> heure_arrivee
%type <sval> heure_avec_plus
%define parse.error verbose
%%

liste : billet liste | billet

billet: DOSSIER STRING NEWLINE infos_passager NEWLINE vols;

infos_passager: nom_prenom { printf("Infos passager : %s\n", $1); free($1); };

vols: vol NEWLINE vols | vol NEWLINE;

vol: date NUM_VOL airport heure airport heure_arrivee duree_vol { printf("Vol : %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7); free($1); free($2); free($3); free($4); free($5); free($6); free($7); };

duree_vol : heure 

heure : NUM COLON NUM { char str[20]; snprintf(str, sizeof(str), "%s:%s", $1, $3); $$ = strdup(str); free($1); free($3); }

airport : STRING

date: NUM SLASH NUM SLASH NUM { char str[20]; snprintf(str, sizeof(str), "%s/%s/%s", $1, $3, $5); $$ = strdup(str); free($1); free($3); free($5); }

heure_arrivee: heure_avec_plus | heure;

heure_avec_plus: PLUS heure { $$ = $2; };

nom_prenom: STRING | STRING SLASH STRING { char str[120]; snprintf(str, sizeof(str), "%s/%s", $1, $3); $$ = strdup(str); free($1); free($3); };

%%

int main() {
    yyparse();
    return 0;
}

void yyerror(const char *s) {
    fprintf(stderr, "Erreur de syntaxe : %s\n", s);
}

Built it:

$ flex billet.l
$ bison -d billet.y
$ gcc billet.tab.c lex.yy.c -lfl

And run it with something like:

$ ./a.out < input.txt
Infos passager : ANTOINE/DESAINT-EXUPERY
Vol : 22/01/16 OS412 CDG 10:00 VIE 12:00 2:00
Vol : 22/01/16 OS051 VIE 13:20 NRT 07:25 11:05
Vol : 23/01/16 OS8577 NRT 10:00 CHI 09:00 01:45