Search code examples
cyacclex

Is it possible to get the value of tokens?


I was wondering if it is possible to get the values of tokens in yacc & lex. For instance, let's say that I have such a definition in my lex file:

";" {printf("%s ", yytext); return SEMICOLON;}

Now, is it possible to access the value of SEMICOLON in the main function of lex?


Solution

  • I was wondering if it is possible to get the values of tokens in yacc & lex.

    In a general sense, yes, of course it is.

    For instance, let's say that I have such a definition in my lex file:

    ";" {printf("%s ", yytext); return SEMICOLON;}
    

    Now, is it possible to access the value of SEMICOLON in the main function of lex?

    Your question seems to be basically about the scope of the identifier SEMICOLON, but that depends on the form and location of its declaration(s). In comments you wrote that in your particular case,

    It is defined inside the lex file. after the %% part

    I take that as something along these lines in your lex input file:

    %%
    
    /* ... no rules before this */
    
        #define SEMICOLON 59
    
    /* ... */
    
    ";" {printf("%s ", yytext); return SEMICOLON;}
    

    In that case, the macro definition is emitted into the body of the generated yylex() function, before any code implementing the scanning rules. It will be visible from that point through the end of the generated C source file, unless explicitly un- or redefined, BUT it is unspecified what other functions may be there. Note also that if you declared it as a variable instead of a macro, then it would be a local variable of the scanner function.

    THAT'S NOT THE WAY TO DO IT.

    Declarations meant to be global to the C source generated by lex should go in the definitions section, enclosed between %{ and %}. Best practice is to put such things at or very near the top:

    %{
    #define SEMICOLON 59
    %}
    
    %%
    
    /* ... */
    
    ";" {printf("%s ", yytext); return SEMICOLON;}
    

    That will cause the definition to be placed at top scope, near the top of the file.

    By itself, however, that does not provide visibility to any other source file in your project. If you are using yacc to generate a parser to accompany your lex-based scanner, then the idiomatic thing to do is to have yacc also generate a C header file containing the token identifiers (default name: y.tab.h), and then place a corresponding #include directive into your lex input instead of directly #defineing the symbols there. You can do similarly by hand, if you're not using yacc but do want to share token identifiers and codes.

    %{
    #include "y.tab.h"
    %}
    
    %%
    
    /* ... */
    
    ";" {printf("%s ", yytext); return SEMICOLON;}