Search code examples
cparsingbisonflex-lexeryacc

Segmentation fault with yacc/bison


I am trying to write a simple HTTP request parser for a school assignment but I have this segmentation fault that I can't get rid of. I think that my production rules are ok. I have executed bison with tracing enabled and it always produces a segfault at part where it parses my header:

Reducing stack by rule 9 (line 59):
   $1 = token ID ()
   $2 = token COLON ()
   $3 = token STRING ()
[4]    36661 segmentation fault (core dumped)  ./problem1 < input.txt

Here is the content of my request.l file:

%option noyywrap
%{
    #include<stdio.h>
    #include "request.tab.h"
    char *strclone(char *str);
%}

num                                     [0-9]+(\.[0-9]{1,2})?
letter                                  [a-zA-Z]
letternum                               [a-zA-Z0-9\-]
id                                      {letter}{letternum}*
string                                  \"[^"]*\"
fieldvalue                              {string}|{num}

%%

(GET|HEAD|POST|PUT|DELETE|OPTIONS)      { yylval = strclone(yytext); return METHOD; }
HTTP\/{num}                             { yylval = strclone(yytext); return VERSION; }
{id}                                    { yylval = strclone(yytext); return ID; }
"/"                                     { return SLASH; }
"\n"                                    { return NEWLINE; }
{string}                                { yylval = strclone(yytext); return STRING; }
":"                                     { return COLON; }
[ \t\n]+                                       ;
. {
    printf("Unexpected: %c\nExiting...\n", *yytext);
    exit(0);
}

%%

char *strclone(char *str) {
    int len = strlen(str);
    char *clone = (char *)malloc(sizeof(char)*(len+1));
    strcpy(clone,str);
    return clone;
}

and my request.y file:

%{
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#define YYSTYPE char*

extern int yylex();
extern int yyparse();
extern FILE* yyin;

void yyerror(const char* s);
%}

%token METHOD
%token SLASH
%token VERSION
%token STRING
%token ID
%token COLON
%token NEWLINE

%%

REQUEST: METHOD URI VERSION NEWLINE HEADERS {
       printf("%s %s", $1, $2);
    }
;

URI: SLASH DIR {
        $$ = (char *)malloc(sizeof(char)*(1+strlen($2)+1));
        sprintf($$, "//%s", $2);
    }
;

DIR: ID SLASH {
        $$ = (char *)malloc(sizeof(char)*(strlen($1)+2));
        sprintf($$, "%s//", $1);
    }
    |ID {
        $$ = $1;
    }
    | {
        $$ = "";
    }
;

HEADERS: HEADER {
        $$ = $1;
    }
    |HEADER NEWLINE HEADERS {
        $$ = (char *)malloc(sizeof(char)*(strlen($1)+1+strlen($3)+1));
        sprintf($$, "%s\n%s", $1, $3);
    }
    |{
        $$ = "";
    }
;

HEADER: ID COLON STRING {
        $$ = (char *)malloc(sizeof(char)*(strlen($1)+1+strlen($2)+1));
        sprintf($$, "%s:%s", $1, $3);
    }
;

%%

void yyerror (char const *s) {
   fprintf(stderr, "Poruka nije tacna\n");
}

int main() {
    yydebug = 1;
    yyin = stdin;

    do {
        yyparse();
    } while(!feof(yyin));

    return 0;
}

Also here is the content of my input.txt I am passing in as input:

GET / HTTP/1.1
Host: "developer.mozzila.org"
Accept-language: "fr"

Solution

  • In request.y, you include the directive

    #define YYSTYPE char*
    

    So in the parser code generated by Bison, yylval is of type char*. But that line is not inserted into request.l. So in the scanner code generated by Flex, yylval has its default type, int.

    You could fix this by adding the definition of YYSTYPE to your request.l file but then you have the same setting repeated in two places, which is a recipe for disaster. Instead, use Bison`s declaration syntax:

    %define api.value.type { char* }
    

    (Note: that's a Bison declaration, not a C preprocessor define, so it goes with your other Bison % directives.)

    The advantage of this solution is that Bison also adds the declaration to the header file it produces. Since that file is #included in request.l, no modifications need to be made to your scanner.

    C, unfortunately, allows a pointer to be converted to an integer type even if the integer type is too narrow to hold the entire address, which is the case with a typical 64-bit platform with 8-byte pointers and 4-byte int. So in your scanner, setting the value of what the compiler thinks is a four-byte int to an eight-byte pointer means that the value will be truncated. So when the parser attempts to use it as an address, you'll get a segfault. If you're lucky.

    Most C compilers will warn you about this truncation -- but only if you tell the compiler that you want to see warnings (-Wall for clang and gcc). Compiling with -Wall is always important, even when compiling the output of a code generator.

    You also need to fix the typo noted by @JakobStark.