Search code examples
pythonurlparse

Why is urlparse.urlenparse works inconsistent?


When netloc is empty urlparse.urlunparse is inconsistent:

>>> urlparse.urlunparse(('http','','test_path', None, None, None))
'http:///test_path'
>>> urlparse.urlunparse(('ftp','','test_path', None, None, None))
'ftp:///test_path'
>>> urlparse.urlunparse(('ssh','','test_path', None, None, None))
'ssh:test_path'

Is it a bug or a feature? I would expect urlunparse to behave always, as in first example, even if scheme is not recognized.


Solution

  • The data tuple you are passing to urlunparse has the following components:

    scheme, netloc, url, query, fragment = data
    

    When there is no netloc, and the scheme is not in uses_netloc, the url is

        url = scheme + ':' + url
    

    That is the way urlunparse (which calls urlunsplit) is defined:

    def urlunsplit(data):
        ...
        scheme, netloc, url, query, fragment = data
        if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
            if url and url[:1] != '/': url = '/' + url
            url = '//' + (netloc or '') + url
        if scheme:
            url = scheme + ':' + url
    

    Note that 'ssh' is not in uses_netloc:

    uses_netloc = ['ftp', 'http', 'gopher', 'nntp', 'telnet',
                   'imap', 'wais', 'file', 'mms', 'https', 'shttp',
                   'snews', 'prospero', 'rtsp', 'rtspu', 'rsync', '',
                   'svn', 'svn+ssh', 'sftp','nfs','git', 'git+ssh']
    

    You do get a url that begins with ssh:// if you supply a netloc:

    In [140]: urlparse.urlunparse(('ssh','netloc','test_path', None, None, None))
    Out[140]: 'ssh://netloc/test_path'