Search code examples
cyacclexc89

Why lex invokes yyerror while parsing comma separated values?


I am preparing a yacc/lex test program. The lexer is intended to read integer numbers (long), float numbers (double) and date times in a specific format(YYYYMMDD HHMM).

lexer.l

%{
#include <time.h>
#include "grammar.h"

void read_float_number(void);
void read_integer_number(void);
void read_date_YYYYMMDD_HHMM(void);
void yyerror(const char* msg);

%}

%%

                                                                        /* SKIP BLANKS AND TABS */
[\t ]                                                                   { ; }

                                                                        /* YYYYMMDD HHMM DATE */
[12][09][0-9][0-9][0-1][0-9][0-3][0-9][ ][0-2][0-9][0-5][0-9]           { read_date_YYYYMMDD_HHMM(); return DATETIME; }

                                                                        /* FLOAT NUMBER */
[0-9]+\.[0-9]+                                                          { read_float_number(); return FLOAT_NUMBER; }

                                                                        /* INTEGER NUMBER */
[0-9]+                                                                  { read_integer_number(); return INTEGER_NUMBER; }

%%

/* READ FLOAT NUMBER */
void read_float_number(void) {
        sscanf(yytext, "%lf", &yylval.float_number);
}

/* READ INTEGER NUMBER */
void read_integer_number(void) {
        sscanf(yytext, "%ld", &yylval.integer_number);
}

/* READ YYYYMMDD HHMM DATE */
void read_date_YYYYMMDD_HHMM(void) {

        /*  DATETIME STRUCT TM */
        struct tm dt;
        char buffer[80];

        /* READ VALUES */
        sscanf(yytext, "%4d%2d%2d %2d%2d", &dt.tm_year, &dt.tm_mon, &dt.tm_mday, &dt.tm_hour, &dt.tm_min);

        /* NORMALIZE VALUES */
        dt.tm_year = dt.tm_year - 1900;         /* NORMALIZE YEAR */
        dt.tm_mon = dt.tm_mon - 1;              /* NORMALIZE MONTH */
        dt.tm_isdst = -1;                       /* NO INFORMATION ABOUT DST */
        mktime(&dt);                            /* NORMALIZE STRUCT TM */

        /* PRINT DATETIME */
        strftime(buffer, 80, "%c %z %Z\n", &dt);
        printf("%s\n", buffer);

        /* COPY STRUCT TM TO YACC RETURN VALUE */
        memcpy(&dt, &yylval.datetime, sizeof(dt));

}

/* YYERROR */
void yyerror(const char* msg) {
        fprintf(stderr, "yyerror %s\n", msg);
        exit(1);
}

grammar.y

The grammar is intended to parse this kind of lines (DATETIME,FLOAT,FLOAT,INTEGER):

20191201 17000,1.102290,1.102470,0
%{

#include <time.h>
#include <stdio.h>

%}

%union {

        struct tm       datetime;               /* DATE TIME VALUES */
        double          float_number;           /* 8 BYTES DOUBLE VALUE */
        long            integer_number;         /* 8 BYTES INTEGER VALUE */

}

%token  <datetime>              DATETIME
%token  <float_number>          FLOAT_NUMBER
%token  <integer_number>        INTEGER_NUMBER

%%

lastbid_lastask:        DATETIME ',' FLOAT_NUMBER ',' FLOAT_NUMBER ',' INTEGER_NUMBER   { printf("MATCH %lf %lf %ld\n", $3, $5, $7); }
                        ;

%%

int main(int argc, char *argv[]) {

        yyparse();

        return 0;

}

The makefile to build everything is as follows:

CCFLAGS = -std=c89 -c
YFLAGS = -d     # Forces generation of y.tab.h
OBJS = lexer.o grammar.o
TARGET = readfile

readfile:               $(OBJS)
                        cc $(OBJS) -std=c89 -ll -o $(TARGET)

grammar.h grammar.o:    grammar.y
                        yacc $(YFLAGS) -ogrammar.c grammar.y
                        cc $(CCFLAGS) grammar.c

lexer.o:                lexer.l grammar.h
                        lex -olexer.c lexer.l
                        cc $(CCFLAGS) lexer.c

clean:
                        rm -f $(OBJS) grammar.[ch] lexer.c

I run readfile but after parsing the DATETIME lex seems to invoke yyerror:

% ./readfile 
20191201 170003296,1.102290,1.102470,0
Mon Feb 17 22:20:00 2020 +0100 CET

yyerror syntax error

Same for numbers:

% ./readfile
45.45
yyerror syntax error
% ./readfile
45
yyerror syntax error

But not for arbitrary text:

% ./readfile
abc
abc

Why is lex invoking yyerror? What is missing in the lex parsing code?


Solution

  • As far as I can see, your lexer never returns a ',' token. By default, (f)lex scanners print unrecognised characters to stdout, as, for example, in your test with input abc. However, the unrecognised comma is not shown in your output because the stdout buffer was not flushed before exit() was called in yyerror().

    In any event, we usually put a fallback rule as the last rule in the scanner specification:

    .    { return yytext[0]; }
    

    That guarantees that any unrecognised character will be passed through to the parser as a quoted single-character token. If the parser does not expect that token, it will raise a syntax error immediately.