I am trying to write a simple HTTP request parser for a school assignment but I have this segmentation fault that I can't get rid of. I think that my production rules are ok. I have executed bison with tracing enabled and it always produces a segfault at part where it parses my header:
Reducing stack by rule 9 (line 59):
$1 = token ID ()
$2 = token COLON ()
$3 = token STRING ()
[4] 36661 segmentation fault (core dumped) ./problem1 < input.txt
Here is the content of my request.l file:
%option noyywrap
%{
#include<stdio.h>
#include "request.tab.h"
char *strclone(char *str);
%}
num [0-9]+(\.[0-9]{1,2})?
letter [a-zA-Z]
letternum [a-zA-Z0-9\-]
id {letter}{letternum}*
string \"[^"]*\"
fieldvalue {string}|{num}
%%
(GET|HEAD|POST|PUT|DELETE|OPTIONS) { yylval = strclone(yytext); return METHOD; }
HTTP\/{num} { yylval = strclone(yytext); return VERSION; }
{id} { yylval = strclone(yytext); return ID; }
"/" { return SLASH; }
"\n" { return NEWLINE; }
{string} { yylval = strclone(yytext); return STRING; }
":" { return COLON; }
[ \t\n]+ ;
. {
printf("Unexpected: %c\nExiting...\n", *yytext);
exit(0);
}
%%
char *strclone(char *str) {
int len = strlen(str);
char *clone = (char *)malloc(sizeof(char)*(len+1));
strcpy(clone,str);
return clone;
}
and my request.y file:
%{
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#define YYSTYPE char*
extern int yylex();
extern int yyparse();
extern FILE* yyin;
void yyerror(const char* s);
%}
%token METHOD
%token SLASH
%token VERSION
%token STRING
%token ID
%token COLON
%token NEWLINE
%%
REQUEST: METHOD URI VERSION NEWLINE HEADERS {
printf("%s %s", $1, $2);
}
;
URI: SLASH DIR {
$$ = (char *)malloc(sizeof(char)*(1+strlen($2)+1));
sprintf($$, "//%s", $2);
}
;
DIR: ID SLASH {
$$ = (char *)malloc(sizeof(char)*(strlen($1)+2));
sprintf($$, "%s//", $1);
}
|ID {
$$ = $1;
}
| {
$$ = "";
}
;
HEADERS: HEADER {
$$ = $1;
}
|HEADER NEWLINE HEADERS {
$$ = (char *)malloc(sizeof(char)*(strlen($1)+1+strlen($3)+1));
sprintf($$, "%s\n%s", $1, $3);
}
|{
$$ = "";
}
;
HEADER: ID COLON STRING {
$$ = (char *)malloc(sizeof(char)*(strlen($1)+1+strlen($2)+1));
sprintf($$, "%s:%s", $1, $3);
}
;
%%
void yyerror (char const *s) {
fprintf(stderr, "Poruka nije tacna\n");
}
int main() {
yydebug = 1;
yyin = stdin;
do {
yyparse();
} while(!feof(yyin));
return 0;
}
Also here is the content of my input.txt I am passing in as input:
GET / HTTP/1.1
Host: "developer.mozzila.org"
Accept-language: "fr"
In request.y
, you include the directive
#define YYSTYPE char*
So in the parser code generated by Bison, yylval
is of type char*
. But that line is not inserted into request.l
. So in the scanner code generated by Flex, yylval
has its default type, int
.
You could fix this by adding the definition of YYSTYPE
to your request.l
file but then you have the same setting repeated in two places, which is a recipe for disaster. Instead, use Bison`s declaration syntax:
%define api.value.type { char* }
(Note: that's a Bison declaration, not a C preprocessor define, so it goes with your other Bison %
directives.)
The advantage of this solution is that Bison also adds the declaration to the header file it produces. Since that file is #include
d in request.l
, no modifications need to be made to your scanner.
C, unfortunately, allows a pointer to be converted to an integer type even if the integer type is too narrow to hold the entire address, which is the case with a typical 64-bit platform with 8-byte pointers and 4-byte int
. So in your scanner, setting the value of what the compiler thinks is a four-byte int
to an eight-byte pointer means that the value will be truncated. So when the parser attempts to use it as an address, you'll get a segfault. If you're lucky.
Most C compilers will warn you about this truncation -- but only if you tell the compiler that you want to see warnings (-Wall
for clang and gcc). Compiling with -Wall
is always important, even when compiling the output of a code generator.
You also need to fix the typo noted by @JakobStark.