Search code examples
bisonflex-lexeryacclex

Reduce/reduce conflict with a rule for two datetime formats


My data contains a datetime value. The datetime value may be expressed in military datetime:

day hour minute timezone

Example: 081837Z

Or, the datetime value may be expressed in month datetime:

month day hour minute timezone

Example: 08081837Z

Bison reports: 1 reduce/reduce conflict

The month, day, hour, and minute values are each 2 digits, so my lexer simply sends to the parser a 2-digit token for each value:

{TWODGT} { yylval.sval = malloc(yyleng + 1); strcpy(yylval.sval, yytext); return(TWODIGITS); }

Why am I getting the reduce/reduce conflict? What do I need to do to fix the problem? Below is my lexer followed by my parser.

Here is the ".l" file:

%{
#include "helloworld.tab.h"
%}


TWODGT [0-9]{2,2}
TZONE  Z

%%
{TWODGT}        { yylval.sval = malloc(yyleng + 1); strcpy(yylval.sval, yytext); return(TWODIGITS); }
{TZONE}         { yylval.sval = malloc(yyleng + 1); strcpy(yylval.sval, yytext); return(TZONE); }
\n              { return(EOL); }
%%
int yywrap(){ return 1;}

Here is the ".y" file:

%{
#include <stdio.h>
int yylex(void);
extern FILE *yyin;
void yyerror(const char* msg);
%}

%union
{
  char *sval;
}
%token <sval> TWODIGITS TZONE 
%token EOL

%type <sval> time tzone mildt mondt month day hour minute 

%start test

%%
test: time EOL  { printf("%s\n> ", $1); }
 ;
 
time: mildt                 { $$ = "Military DateTime"; }
 | mondt                    { $$ = "Month DateTime"; }
;

mildt: day hour minute tzone        
;

mondt: month day hour minute tzone  
;

month: TWODIGITS                      
;

day: TWODIGITS                      
;

hour: TWODIGITS                     
;

minute: TWODIGITS                   
;

tzone: TZONE                        
;

%%

int main(int argc, char *argv[])
{
    yyin = fopen(argv[1], "r");
    yyparse();
    fclose(yyin);
    
    return 0;
}

void yyerror(const char *msg)
{
  fprintf(stderr, "error: %s\n", msg);
}

Solution

  • The basic problem you have is lookahead -- when the parser sees the first TWODIGITS token, it does not know whether it should be reduced as a day for a mildt or month for a mondt -- which it is can't be known until the 4th token.

    You can avoid this by getting rid of all the identical unitary rules (for month/day/hour/minute):

    mildt: TWODIGITS TWODIGITS TWODIGITS TZONE
    mondt: TWODIGITS TWODIGITS TWODIGITS TWODIGITS TZONE
    

    this way, nothing needs to be reduced until you get to the TZONE at the end and know how many digits there are.