Search code examples
cyacclex

Parser with LEX and YACC


i'm trying to implement a time parser with LEX & YACC. I'm a complete newbie to those tools and C programming.

The program has to print a message (Valid time format 1: input ) when one of those formats is entered: 4pm, 7:38pm, 23:42, 3:16, 3:16am, otherwise a "Invalid character" message is printed.

lex file time.l :

%{
#include <stdio.h>
#include "y.tab.h"
%}

%%

[0-9]+                {yylval=atoi(yytext); return digit;}
"am"                   { return am;}
"pm"                   { return pm;}
[ \t\n]               ;
[:]                    { return colon;}
.                     { printf ("Invalid character\n");}

%%

yacc file time.y:

%{
void yyerror (char *s);
int yylex();
#include <stdio.h>
#include <string.h>

%}

%start time
%token digit
%token am
%token pm
%token colon

%%

time        :  hour ampm           {printf ("Valid time format 1 : %s%s\n ", $1, $2);}
            |  hour colon minute   {printf ("Valid time format 2 : %s:%s\n",$1, $3);}
            |  hour colon minute ampm {printf ("Valid time format 3 : %s:%s%s\n",$1, $3, $4); }
            ;

ampm        :   am               {$$ = "am";}
            |   pm               {$$ = "pm";}
            ;

hour        :   digit digit             {$$ = $1 * 10 + $2;}
            |   digit             { $$ = $1;}
            ;

minute      :   digit digit         {$$ =  $1 * 10 + $2;} 
            ;

%%
int yywrap()
{
        return 1;
} 

int main (void) {

  return yyparse();
}

void yyerror (char *s) {fprintf (stderr, "%s\n", s);}

compiling with this command:

yacc -d time.y && lex time.l && cc lex.yy.c y.tab.c -o time

I'm getting some warnings:

time.y:17:47: warning: format specifies type 'char *' but the argument has type
      'YYSTYPE' (aka 'int') [-Wformat]
    {printf ("Valid time format 1 : %s%s\n ", (yyvsp[(1) - (2)]), (yyvsp.

This warning appears for all the variables in printf statements. The values are all char, because even the number in the time string is converted with the atoi function.

Executing the program with a valid input throws this error:

./time

1pm

[1]    2141 segmentation fault  ./time

Can someone help me? Thanks in advance.


Solution

  • This (f)lex rule:

    [0-9]+                {yylval=atoi(yytext); return digit;}
    

    recognizes any integer, not just a digit. (It allows leading zeros, which is probably appropriate for a date parser.) It assumes that yylval is an int, which is the case if you don't do something to declare the type of yylval.

    Meanwhile, this (f)lex rule:

    "am"                 { return am;}
    

    recognizes the token am, but does not set the value of yylval.

    Now, in your bison file, you have:

    hour        :   digit digit       { $$ = $1 * 10 + $2; }
                |   digit             { $$ = $1;}
                ;
    

    Since digit actually represents an entire integer, the digit digit production is incorrect. It would recognize, for example, the input 23 75 (since your flex file ignores whitespace), but it would turn that into the value 305 (10*23 + 75). That hardly seems appropriate. Again, it assumes that the type of the semantic values $$ and $1 is int, which is the default case.

    However, the production:

    ampm        :   am               {$$ = "am";}
                |   pm               {$$ = "pm";}
                ;
    

    requires that the type of the result semantic value be char * (or even const char*). Since you have not done anything to declare the type of semantic values, their type is int and the assignment is just as invalid as would be the C statement:

    int ampm = "am";
    

    So the C compiler issues an error message.

    Furthermore, in your production:

    time        :  hour ampm           {printf ("Valid time format 1 : %s%s\n ", $1, $2);}
    

    you assume that the semantic values $1 and $2 are strings (char*). BUt the values are actually integers, so printf will do something undefined and probably disastrous (in this case, segfault). (Because of the nature of C this is not a compile-time error, but most C compilers will issue a warning. Apparently, your C compiler does so.)

    How this should be fixed depends on your interpretation of the assignment. When it says "print a message (Valid time format 1: input )", does it mean that the literal input string should be printed, or is it ok to print an interpretation of the string? That is, given actual inputs

    8:23am
    08:23am
    

    Would you want the messages to be

    Valid time format 1: 8:23am
    Valid time format 1: 08:23am
    

    Or is it appropriate to normalize:

    Valid time format 1: 8:23am
    Valid time format 1: 8:23am
    

    You should (re-)read the section in the bison manual on semantic types, and then decide whether you want the type to be int, char*, or a union of the two.

    Some other things you need to think about:

    1. Your flex file recognizes any integer, but neither hours nor minutes can be arbitrary integers. Both are limited to two digits; normally, the minutes should always be two digits (so that 9:3am is not a way of writing 9:03am). They both have limited ranges of valid values; minutes must be between 00 and 59, while hours is between 1 and 12 if am or pm is specified, and otherwise between 0 and 23. Or perhaps 24. (Actually, there are lots of different possible validity conventions for hours; you might choose to be flexible or strict.)

    2. Your problem description doesn't appear to allow spaces in the time specifications, but your flex file ignores whitespace. So that might lead you to recognize incorrect inputs (depending, again, on how strict you wish to be). Also see the note about output in this case: does the whitespace appear in the output (assuming it is acceptable)?

    3. Your flex file issues an error message when it sees a character it doesn't recognize, but it does not stop lexing. In effect, that means that illegal characters will be dropped from the input stream, so that an input like:

      1;:17rpm
      

      will result in two illegal character messages followed by a message saying that the input was a valid 1:17pm. That is unlikely to be what you wanted.

    As a final note, I have to say that in my opinion, understanding C is an absolute prerequisite to using flex and bison. Trying to teach all three at the same time strikes me as pedagogically suspect.