Search code examples
pythonuriurlliburlparse

Python 3 urlib urlparse URI parsing


I'm a little bit puzzled. I hope somebody would help me =)

Python urlparse function result depends on a scheme that was specified in a URI.

For example, this call returns '/path;'

urllib.parse.urlparse('some://foo.bar/path;').path

But this call returns '/path'

urllib.parse.urlparse('http://foo.bar/path;').path

As I understand, the first variant is parsed as RFC 3986. But the second one is parsed as RFC 2396. Am I right? And what to do to parse any string as RFC 3986 describes it?


Solution

  • If you don't want to split the parameters from the path then use urlsplit.

    urllib.parse.urlsplit('http://foo.bar/path;')
    

    Output

    SplitResult(scheme='http', netloc='foo.bar', path='/path;', query='', fragment='')