Search code examples
pythonurl

urlparse/urlsplit and urlunparse, what's the Pythonic way to do this?


The background (but is not a Django-only question) is that the Django test server does not return a scheme or netloc in its response and request urls.

I get /foo/bar for example, and I want to end up with http://localhost:8000/foo/bar.

urllib.parse.urlparse (but not so much urllib.parse.urlsplit) makes gathering the relevant bits of information, from the test url and my known server address, easy. What seems more complicated than necessary is recomposing a new url with the scheme and netloc added via urllib.parse.urlcompose which wants positional arguments, but does not document what they are, nor support named arguments. Meanwhile, the parsing functions return immutable tuples...

def urlunparse(components):
    """Put a parsed URL back together again.  This may result in a ..."""

I did get it working, see code below, but it looks really kludgy, around the part where I need to first transform the parse tuples into lists and then modify the list at the needed index position.

Is there a more Pythonic way?

sample code:


from urllib.parse import urlsplit, parse_qs, urlunparse, urlparse, urlencode, ParseResult, SplitResult

server_at_ = "http://localhost:8000"
url_in = "/foo/bar"  # this comes from Django test framework I want to change this to "http://localhost:8000/foo/bar"

from_server = urlparse(server_at_)
print("  scheme and netloc from server:",from_server)


print(f"{url_in=}")
from_urlparse = urlparse(url_in)

print("  missing scheme and netloc:",from_urlparse)

#this works
print("I can rebuild it unchanged :",urlunparse(from_urlparse))

#however, using the modern urlsplit doesnt work (I didn't know about urlunsplit when asking)
try:
    print("using urlsplit", urlunparse(urlsplit(url_in)))
#pragma: no cover pylint: disable=unused-variable
except (Exception,) as e: 
    print("no luck with urlsplit though:", e)


#let's modify the urlparse results to add the scheme and netloc
try:
    from_urlparse.scheme = from_server.scheme
    from_urlparse.netloc = from_server.netloc
    new_url = urlunparse(from_urlparse)
except (Exception,) as e: 
    print("can't modify tuples:", e)


# UGGGH, this works, but is there a better way?
parts = [v for v in from_urlparse]
parts[0] = from_server.scheme
parts[1] = from_server.netloc

print("finally:",urlunparse(parts))

sample output:

  scheme and netloc from server: ParseResult(scheme='http', netloc='localhost:8000', path='', params='', query='', fragment='')
url_in='/foo/bar'
  missing scheme and netloc: ParseResult(scheme='', netloc='', path='/foo/bar', params='', query='', fragment='')
I can rebuild it unchanged : /foo/bar
no luck with urlsplit though: not enough values to unpack (expected 7, got 6)
can't modify tuples: can't set attribute
finally: http://localhost:8000/foo/bar

Solution

  • If you need it in Django then I found request.build_absolute_uri() in question

    How can I get the full/absolute URL (with domain) in Django? - Stack Overflow

    I didn't test it but maybe it resolves this problem in Django.

    Other modules/frameworks may have also own functions for this.

    As I rembeber module scrapy for scraping HTML has own function response.urljoin() to convert relative url into absolute url.


    As for functions in module urllib:

    You would have to use

    • urlsplit with urlunsplit (which use less values)
    • urlparse with urlunparse (which use more values)

    There is "hidden" function _replace() which creates new ParseResult with replaced values.

    new_urlparse = from_urlparse._replace(scheme=from_server.scheme, netloc=from_server.netloc)
    

    Usually I need only urljoin()

    server_at_ = "http://localhost:8000"  # base 
    url_in = "/foo/bar"                   # relative url
    
    absolute_url = urljoin(server_at, url_in)