Search code examples
pythonpython-3.xurlurlliburlparse

Concatenate base url and path using urllib


I am trying to concatenate a base URL url1 and a relative path url2 using Python 3's urllib.parse, but do not get the desired result. In addition I tried using os.path.join (which is not meant to be used for this purpose) and simple string concatenation using .format():

import os.path
import urllib.parse

url1 = "www.sampleurl.tld"
url2 = "/some/path/here"


print(urllib.parse.urljoin(url1, url2))
# --> "/some/path/here"

print(os.path.join(url1, url2))
# --> "/some/path/here"

print("{}{}".format(url1, url2))
# --> "www.sampleurl.tld/some/path/here" (desired output)

The simple string concatenation returns the desired absolute url. However, this approach seems to be very naive and not very elegant, since it assumes that url2 starts with / which may not be the case. For sure, I could check this by calling url2.startswith('/') and change the string concatenation to "{}/{}".format(url1, url2) to provide the desired flexibility, but I am still wondering how to do this in a proper way by means of urllib.parse.


Solution

  • urljoin expects the first argument baseurl to include the schema.

    So adding https:// or http:// for that matter to your url1 string should do the job.

    import urllib.parse
    
    url1 = "https://www.sampleurl.tld"
    url2 = "/some/path/here"
    
    
    print(urllib.parse.urljoin(url1, url2))
    # --> "https://www.sampleurl.tld/some/path/here"