Search code examples
pythonpython-2.7url

How can I prepend http to a url if it doesn't begin with http?


I have urls formatted as:

google.com
www.google.com
http://google.com
http://www.google.com

I would like to convert all type of links to a uniform format, starting with http://

http://google.com

How can I prepend URLs with http:// using Python?


Solution

  • Python do have builtin functions to treat that correctly, like

    p = urlparse.urlparse(my_url, 'http')
    netloc = p.netloc or p.path
    path = p.path if p.netloc else ''
    if not netloc.startswith('www.'):
        netloc = 'www.' + netloc
    
    p = urlparse.ParseResult('http', netloc, path, *p[3:])
    print(p.geturl())
    

    If you want to remove (or add) the www part, you have to edit the .netloc field of the resulting object before calling .geturl().

    Because ParseResult is a namedtuple, you cannot edit it in-place, but have to create a new object.

    PS:

    For Python3, it should be urllib.parse.urlparse