Search code examples
cbisonyacclex

Getting invalid token to Bison parser from yylex


I have made lex/bison parser where I am using lex named token rules like: a {return Tok_A;} and yacc has declaration of this token: %token Tok_A then grammar follows. Everything works fine, if the string is right, it accepts. Now I try to make more general parser using directly the alphabet in lex. For some reason yacc gives me invalid token when I want to send "a" character:

//parser.l
%{
#include "parser4.tab.h"
%}

%%
[a-h]    {return *yytext;}
\n   {return 0;}  /* EOF */
%%

//parser.y
%{
   extern void yyerror(char *);
   extern int yylex(void);
   #define YYDEBUG 1 
 %}
 
%token a 

%%
S : a {printf("S->a");}
%%

int main(void)
{
#if YYDEBUG
  yydebug = 1;
#endif
    if(!yyparse())
        printf("End of input reached\n");
    return 0;
}

void yyerror (char *s)
{
  /* fprintf (stderr, "%s\n", s); */
  printf("Incorrect derivation!\n");
}

When I compile, start and give program input a, its output is:

Starting parse
Entering state 0
Stack now 0
Reading a token
a
Next token is token "invalid token" ()
Incorrect derivation!
Cleanup: discarding lookahead token "invalid token" ()
Stack now 0


I think the trick is in lex and the rule return yytext. If I understand it right, yacc and lex communicate through parser.tab.h. There are definitions for token translation int to token name. From int 257. 0-255 are used for classic characters. So should I somehow translate the token in lex to ASCII? I thought when lex sends directly the "a" char, bison/yacc would understand it.


Solution

  • When you declare %token a it defines a as a name for a token, which you could return from lex. But that is not the same as the character 'a'. If you want to use the character 'a' as a token in the grammar, you DON'T need to declare it, but you DO need single-quotes around it, as 'a' and not a

    In your case, change the yacc grammar to

    S : 'a' {printf("S->a");}
    

    and it will work