I am writing a HTTP header parser in YACC. Since HTTP request and response has same structure except for first line, I hope to use the same parser for them. I individually tested request_line
and response_line
and they work on HTTP request and HTTP response respectively. However when I combine them in the following way, http_header
only matches HTTP requests rules and raises syntax error, unexpected t_backslash, expecting t_digit or t_dot or t_token_char or t_sp
when given HTTP response HTTP/1.1 200 OK\r\nHost: foo.com\r\nConnection: Keep-alive\r\n\r\n
. How can I make start_line
match either request_line
or response_line
?
0 $accept: request $end
1 allowed_char_for_token: t_token_char
2 | t_digit
3 | t_dot
4 token: allowed_char_for_token
5 | token allowed_char_for_token
6 allowed_char_for_text: allowed_char_for_token
7 | t_separators
8 | t_colon
9 | t_backslash
10 text: allowed_char_for_text
11 | text ows allowed_char_for_text
12 ows: %empty
13 | t_sp
14 | t_ws
15 t_number: t_digit
16 | t_number t_digit
17 request_line: token t_sp text t_sp text t_crlf
18 response_line: text t_sp t_number t_sp text t_crlf
19 header: token ows t_colon ows text ows t_crlf
20 headers: header
21 | header headers
22 start_line: request_line
23 | response_line
24 http_headers: start_line headers t_crlf
(My apology for the confusing names. What I mean by http_head
is the first line plus the rest of the headers. I am not aware of a name for it.)
You are feeding it a backslash instead of a carriage return/line feed. Clearly you copied a C string literal into something else that doesn't implement C string escaping conventions.
I wouldn't use something as precise as yacc for this task. I wouldn't use anything more precise than a hand-written tokenizer. And I would certainly not present individual characters from an end of line sequence to the parser.