Search code examples
pythonurlpython-2.6urlparse

Parse custom URIs with urlparse (Python)


My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http.

If I do not adjust urlparse's uses_* lists I get this:

>>> urlparse.urlparse("qqqq://base/id#hint")
('qqqq', '', '//base/id#hint', '', '', '')
>>> urlparse.urlparse("http://base/id#hint")
('http', 'base', '/id', '', '', 'hint')

Here is what I do, and I wonder if there is a better way to do it:

import urlparse

SCHEME = "qqqq"

# One would hope that there was a better way to do this
urlparse.uses_netloc.append(SCHEME)
urlparse.uses_fragment.append(SCHEME)

Why is there no better way to do this?


Solution

  • I think the problem is that URI's don't all have a common format after the scheme. For example, mailto: urls aren't structured the same as http: urls.

    I would use the results of the first parse, then synthesize an http url and parse it again:

    parts = urlparse.urlparse("qqqq://base/id#hint")
    fake_url = "http:" + parts[2]
    parts2 = urlparse.urlparse(fake_url)