Search code examples
parsinggrammaryacccontext-free-grammar

Specifying two alternative rules for an expression in YACC


I am writing a HTTP header parser in YACC. Since HTTP request and response has same structure except for first line, I hope to use the same parser for them. I individually tested request_line and response_line and they work on HTTP request and HTTP response respectively. However when I combine them in the following way, http_header only matches HTTP requests rules and raises syntax error, unexpected t_backslash, expecting t_digit or t_dot or t_token_char or t_sp when given HTTP response HTTP/1.1 200 OK\r\nHost: foo.com\r\nConnection: Keep-alive\r\n\r\n. How can I make start_line match either request_line or response_line?

0 $accept: request $end

1 allowed_char_for_token: t_token_char
2                       | t_digit
3                       | t_dot

4 token: allowed_char_for_token
5      | token allowed_char_for_token

6 allowed_char_for_text: allowed_char_for_token
7                      | t_separators
8                      | t_colon
9                      | t_backslash

10 text: allowed_char_for_text
11     | text ows allowed_char_for_text

12 ows: %empty
13    | t_sp
14    | t_ws

15 t_number: t_digit
16         | t_number t_digit

17 request_line: token t_sp text t_sp text t_crlf

18 response_line: text t_sp t_number t_sp text t_crlf

19 header: token ows t_colon ows text ows t_crlf

20 headers: header
21        | header headers

22 start_line: request_line
23           | response_line

24 http_headers: start_line headers t_crlf

(My apology for the confusing names. What I mean by http_head is the first line plus the rest of the headers. I am not aware of a name for it.)


Solution

  • You are feeding it a backslash instead of a carriage return/line feed. Clearly you copied a C string literal into something else that doesn't implement C string escaping conventions.

    I wouldn't use something as precise as yacc for this task. I wouldn't use anything more precise than a hand-written tokenizer. And I would certainly not present individual characters from an end of line sequence to the parser.