passing data to bison grammar from flex

Having an issue with my flex / bison grammar. Not sure if it is the way that I have set up the recursion that is shooting myself in the foot.

When trying to access the data passed via yylval I would use the $1... variable for each element of the production. However when doing this it is not splitting the values into each token. It prints the whole production. This only happens with the second sentence in the metadata production, the first seems to be OK.

I was intending to create a check(int token_val) function that contains a switch(token_val) and checks the return value for each token and then acts on its yytext appropriately. Is there a way to use the $ variable notation that will give me the return value from the commands production? Or is this the incorrect way to go about things?

I have checked the references for this but maybe I have missed something, would appreciate someone to clarify.

Code: bison



input: input metadata
     | metadata
     ;

metadata: command op data {printf("%s is valid.\n", $3);} // check_data($1) ?
        | data op data op data op data {printf("row data is valid\n\t %s\n", $1);}
        ;

command: PROD_TITL
      |  _DIR
      |  DOP
      |  DIT
      |  FORMAT
      |  CAMERA
      |  CODEC
      |  DATE
      |  REEL
      |  SCENE
      |  SLATE
      ;

op: EQUALS
  | COLON
  | SEP
  ;

data: META
    | REEL_ID
    | SCENE_ID
    | SLATE_ID
    | TAKE
    | MULTI_T
    | LENS
    | STOP
    | FILTERS
    ;

%%

int main(void) {
  return yyparse();
}

lex:

%{
#include <stdio.h>
#include <string.h>
#include "ca_mu.tab.h"
%}

%option yylineno

%%


\"[^"\n]*["\n]              {yylval = yytext; return META;}
[aA-aZ][0-9]+               {yylval = yytext; return REEL_ID;}
([0-9aA-zZ]*\/[0-9aA-zZ]*)  {yylval = yytext; return SCENE_ID;}
[0-9]+                      {yylval = yytext; return SLATE_ID;}
[0-9][aA-zZ]+               {yylval = yytext; return TAKE;}
[0-9]+-[0-9]+               {yylval = yytext; return MULTI_T;}
[0-9]+MM                    {yylval = yytext; return LENS;}
T[0-9]\.[0-9]+              {yylval = yytext; return STOP;}
"{"([^}]*)"}"               {yylval = yytext; return FILTERS;}

Output sample:

"My Production" is valid.
"Dir Name" is valid.
"DOP Name" is valid.
"DIT Name" is valid.
"16:9" is valid.
"Arri Alexa" is valid.
"ProRes" is valid.
"02/12/2020" is valid.
A001 is valid.
23/22a is valid.
001 is valid.
row data is valid
         1, 50MM, T1.8, { ND.3 }  // $1 prints all tokens?
row data is valid
         3AFS,   50MM, T1.8, {ND.3}

input

/* This is a comment */

production_title = "My Production"
director         = "Dir Name"
DOP              = "DOP Name"
DIT              = "DIT Name"
format           = "16:9"
camera           = "Arri Alexa"
codec            = "ProRes"
date             = "02/12/2020"

reel: A001
  scene: 23/22a
    slate: 001
      1, 50MM, T1.8, { ND.3 }
      3AFS,   50MM, T1.8, {ND.3}
    slate: 002
      1,  65MM, T1.8, {ND.3 BPM1/2}
    slate: 003
      1-3, 24MM, T1.9, {ND.3}

END

Solution

The problem is here, in your scanner actions:

yylval = yytext;

You must never do this.

yytext points into a temporary buffer which is only valid until the next call to yylex(), and that means you are effectively making yylval a dangling pointer. Always copy the string, as with:

yylval = strdup(yytext);

(Don't forget to call free() on the copied strings when you no longer need the copies.)