Search code examples
cbisonyacclex

Access identifier content when using a custom type in Bison


I have scanner and parser ready, using flex and bison.

The parser is building a tree directly in the actions, and to do so I created a struct called STreeNode and I am using

#define YYSTYPE_IS_DECLARED
typedef STreeNode* YYSTYPE;

The struct is:

typedef struct tagSTreeNode
{
    EOperationType type;
    int count;
    struct tagSTreeNode **children;
    char *string;
} STreeNode;

There are like 40 tokens, and for every rule I have something like

unlabeled_statement:
        assignment                                                          {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
        | function_call_statement                                           {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
        | goto                                                              {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
        | return                                                            {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
        | conditional                                                       {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
        | repetitive                                                        {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
        | empty_statement                                                   {$$ = createNode(eUNLABELED_STATEMENT, 1, $1);}
        ;

The signature for the createNode function is

STreeNode *createNode(EOperationType type, int count, ...) {

The tree is working fine. The problem is accessing the real value for variable names, function names, etc. Since YYSTYPE is a struct, $x does not have the string value I want to save on the char * string element in the struct.

I have a %token called IDENTIFIER and another called INTEGER, and those should receive the values I want.

Researching, I discovered that I could try and use a union { } to have every token of a specific type. Maybe that could help? And if so, I would necessarily need to specify the type every single token? How can that be implemented?

What about yytext? Couldn't that be used to achieve this goal?

Thank you!

--- EDIT --

So I've created

%union {
    char *string;
    STreeNode *node;
}

and specified every terminal and non terminal type to be one of those. The nodes are still working, but the strings using ($1 for example) are returning null.

Do I need to change anything in the scanner as well? My scanner has:

[a-zA-Z][a-z0-9A-Z]*        { return IDENTIFIER; }
[0-9]+                      { return INTEGER; }

Thanks again.


Solution

  • If your tokens have a type set for them, the lexer needs to set yylval to the type in question. Something like:

    [a-zA-Z][a-z0-9A-Z]*        { yylval.string = strdup(yytext); return IDENTIFIER; }
    [0-9]+                      { yylval.string = strdup(yytext); return INTEGER; }