httpparsingurlwebserverquery-string

Semicolon as URL query separator


Although it is strongly recommended (W3C source, via Wikipedia) for web servers to support semicolon as a separator of URL query items (in addition to ampersand) (at the time of writing, not anymore, via Wikipedia), it does not seem to be generally followed.

For example, compare

        http://www.google.com/search?q=nemo&oe=utf-8

        http://www.google.com/search?q=nemo;oe=utf-8

results. (In the latter case, semicolon is, or was at the time of writing this text, treated as ordinary string character, as if the url was: http://www.google.com/search?q=nemo%3Boe=utf-8)

Although the first URL parsing library i tried, behaves well:

>>> from urlparse import urlparse, query_qs
>>> url = 'http://www.google.com/search?q=nemo;oe=utf-8'
>>> parse_qs(urlparse(url).query)
{'q': ['nemo'], 'oe': ['utf-8']}

What is the current status of accepting semicolon as a separator, and what are potential issues or some interesting notes? (from both server and client point of view)


Solution

  • The W3C Recommendation from 1999 is obsolete. The current status, according to the 2014 W3C Recommendation, is that semicolon is now illegal as a parameter separator:

    To decode application/x-www-form-urlencoded payloads, the following algorithm should be used. [...] The output of this algorithm is a sorted list of name-value pairs. [...]

    1. Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&).

    In other words, ?foo=bar;baz means the parameter foo will have the value bar;baz; whereas ?foo=bar;baz=sna should result in foo being bar;baz=sna (although technically illegal since the second = should be escaped to %3D).