Search code examples
pythonurlurlparse

Odd behavior with urlparse


I was wondering if there are known workarounds to some odd behavior I'm seeing with python's urlparse.

Here are some results from a couple of lines in the python interpeter:

>>> import urlparse
>>> urlparse.parse_qsl('https://localhost/?code=bork&charlie=brown')
[('https://localhost/?code', 'bork'), ('charlie', 'brown')]

In the above example, why is the key for the first value 'https://localhost/?code'? Shouldn't it just be 'code'? Note: parse_qs has the same bad behavior.

>>> urlparse.urlparse('abcd://location/?code=bork&charlie=brown')
ParseResult(scheme='abcd', netloc='location', path='/?code=bork&charlie=brown', params='', query='', fragment='')
>>> urlparse.urlparse('https://location/?code=bork&charlie=brown')
ParseResult(scheme='https', netloc='location', path='/', params='', query='code=bork&charlie=brown', fragment='')

In the above example note that the query string doesn't always get put into the query value. Why does the protocol matter at all? Shouldn't the query field always get the query string? Testing with 'ftp' or other well known protocols seems to also be unhappy.


Solution

  • urlparse.parse_qsl (and urlparse.parse_qs) are methods intended for the query part of the request (the string after the ?).

    Maybe you want to use a method that understands whole URLs first (urlparse.urlparse), and then pass the query from the result to urlparse_qsl:

    >>> import urlparse
    >>> myurl = urlparse.urlparse('https://localhost/?code=bork&charlie=brown')
    >>> print myurl
    ParseResult(scheme='https', netloc='localhost', path='/', params='', query='code=bork&charlie=brown', fragment='')
    >>> print myurl.scheme
    https
    >>> print urlparse.parse_qs(myurl.query)
    {'charlie': ['brown'], 'code': ['bork']}
    

    The scheme matters, because although the query exists in the generic syntax, some protocols may not support them.

    See also:

    http://en.wikipedia.org/wiki/URI_scheme (check out the official registered schemes)