Search code examples
cyacclex

stucture of yacc definitions


I am in the process of writing a parser for a markup language for a personal project:

sample:

/* This is a comment */

production_title = "My Production"
director         = "Joe Smith"
DOP              = "John Blogs"
DIT              = "Random Name"
format           = "16:9"
camera           = "Arri Alexa"
codec            = "ProRes"
date             = _auto

Reel: A001
  Scene: 23/22a
    Slate: 001
      1-2, 50MM, T1.8, {ND.3}
      3AFS,   50MM, T1.8, {ND.3}
    Slate: 002:
      1,  65MM, T1.8, {ND.3 BPM1/2}
    Slate: 003:
      1-3, 24MM, T1.9 {ND.3}

Reel: A002
  Scene: 23/22a
    Slate: 004
      1-5, 32MM, T1.9, {ND.3}
  Scene: 23/21
    Slate: 005
      1, 100MM, T1.9, {ND.6}

END

I have started learning lex and yacc, and have run into a couple of issues regarding the structure of the grammar definitions.

yacc.y

%{
#include <stdio.h>
int yylex();
void yyerror(char *s);
%}

%token PROD_TITL _DIR DOP DIT FORMAT CAMERA CODEC DATE EQUALS
%right META

%%

meta: PROD_TITL EQUALS META {
            printf("%s is set to %s\n",$1, $3);
      }
      | _DIR EQUALS META {
            printf("%s is set to %s\n",$1, $3);
      }
%%

int main(void) {
  return yyparse();
}

void yyerror(char *s) {fprintf(stderr, "%s\n", s);}

lex.l

%{
#include <stdio.h>
#include <string.h>
#include "y.tab.h"
%}

%%

"production_title"  {yylval = strdup(yytext); return PROD_TITL;}
"director"          {yylval = strdup(yytext); return _DIR;}
"DOP"               return DOP;
"DIT"               return DIT;
"format"            return FORMAT;
"camera"            return CAMERA;
"codec"             return CODEC;
"date"              return DATE;

"exit"              exit(EXIT_SUCCESS);

\"[^"\n]*["\n]      { yylval = strdup(yytext);
                      return META;
                    }

=                   return EQUALS;
[ \t\n]                     ;
"/*"([^*]|\*+[^*/])*\*+"/"  ;
.                   printf("unrecognized input\n");
%%

int yywrap(void) {
  return 1;
}

The main issue that I am having is that the program only runs correctly on the first parse then it returns a syntax error which is incorrect. Is this something todo with the way that I have written the grammar?

example output from sample.txt and typed in commands:

hc@linuxtower:~/Documents/CODE/parse> ./a.out < sample.txt
production_title is set to "My Production"
syntax error
hc@linuxtower:~/Documents/CODE/parse> ./a.out
production_title = "My Production"
production_title is set to "My Production"
director = "Joe Smith"
syntax error

When compiling I get warnings in the lex.l file with regards to my regex's:

ca_mu.l: In function ‘yylex’:
ca_mu.l:9:9: warning: assignment makes integer from pointer without a cast [-Wint-conversion]
 "production_title"  {yylval = strdup(yytext); return PROD_TITL;}
         ^
ca_mu.l:10:9: warning: assignment makes integer from pointer without a cast [-Wint-conversion]
 "director"          {yylval = strdup(yytext); return _DIR;}
         ^
ca_mu.l:20:10: warning: assignment makes integer from pointer without a cast [-Wint-conversion]
 \"[^"\n]*["\n]      { yylval = strdup(yytext);
          ^

Could this be the source of the problem or an additional issue?


Solution

  • Those are two separate issues.

    1. Your grammar is as follows, leaving out the actions:

      meta: PROD_TITL EQUALS META
          | _DIR EQUALS META 
      

      That means that your grammar accepts one of two sequences, both having exactly three tokens. That is, it accepts "PROD_TITL EQUALS META" or "_DIR EQUALS META". That's it. Once it finds one of those things, it has parsed as much as it knows how to parse, and it expects to be told that the input is complete. Any other input is an error.

    2. The compiler is complaining about yylval = strdup(yytext); because it has been told that yylval is of type int. That's yacc/bison's default semantic type; if you don't do anything to change it, that's what bison will assume, and it will insert extern int yylval; in the header file it generates, so that the lexer knows what the semantic type is. If you search the internet you'll probably find a variety of macro hacks suggested to change this, but the correct way to do it with a "modern" bison is to insert the following declaration in your bison file, somewhere in the prologue:

      %declare api.value.type { char* }
      

      Later on, you'll probably find that you want a union type instead of making everything a string. Before you reach that point, you should read the section in the Bison manual on Defining Semantic Values. (In fact, you'd be well-advised to read the Bison manual from the beginning up to that point, including the simple examples in section 2. It's not that long, and it's pretty easy reading.)