Search code examples
httpabnf

How should an HTTP header field be parsed?


I'm trying to parse an HTTP header field according the ABNF rule header-field specified in the relevant section of RFC 7230. These rules are:

header-field   = field-name ":" OWS field-value OWS

field-name     = token
field-value    = *( field-content / obs-fold )
field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar    = VCHAR / obs-text

obs-fold       = CRLF 1*( SP / HTAB )
               ; obsolete line folding
               ; see Section 3.2.4

(obs-text is just high-order bytes 0x80 to 0xff).

The problem I'm facing is that header-field rule seems to fail when applied the user-agent string that chrome sets when in responsive mode:

User-Agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Mobile Safari/537.36

The issue stems from the lone '5': when the parser reaches the final 's' in "Nexus", it takes both the 's', the following space, and '5'. This leaves the parsing cursor at the space directly after. That is

   Parsed:    ______________]
   Data:      ...6.0; Nexus 5 Build/MRA58N...
   Cursor:                   ^

Since feild-content does not afford leading whitespace, the rule fails to match against the whole header field, which leads to the parser failing to parse the rest of the message.

It is obvious to me that HTTP headers should be able to contain single characters that are surrounded by whitespace. However this seems to be disallowed according to my reading of the spec.

I have searched online but have not found anything relevant. So I'm assuming it's a mistake on my part. Where is my mistake? and how should the rule actually be interpreted?


Solution

  • For RFCs, you can find errata as indicated on the front page:

    Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at
    http://www.rfc-editor.org/info/rfc7230.

    This one likely is https://www.rfc-editor.org/errata/eid4189 - see https://github.com/httpwg/http-core/issues/19 for more information.