Search code examples
pythonregexurl

Python: Add a trailing slash to the URL but only if the URL doesn't end in a slash already or a file extension


I want, in Python, normalize a URL. My main purpose is to add slash / at the end of the URL if it is not already present but only if the URL doesn't end in a slash already or a file extension (so images, .php ,files pages, etc. aren't affected).

For example, if it is http://www.example.com then it should be converted to http://www.example.com/. But if it is http://www.example.com/image.png then it should not be affected.

To do this, I use this regular expression /([^/.]+)$. Regex demo

But it doesn't work in this python code, start_url is not modified

import re

start_url = "https://zonetuto.fr"
start_url = re.sub(r'/([^/.]+)$', r'/\1/', start_url)
print(start_url)

Solution

  • You could handle this by considering how a URL is constructed.

    The part of the URL that follows the netloc is known as the path

    For example:

    https://www.example.com does not have a path

    https://www.example.com/ has a path - i.e., just the /

    https://www.example.com/banana has a path - i.e., /banana

    Therefore you could utilise urllib.parse as follows:

    from urllib.parse import urlparse
    
    def normalise(url):
        return url if urlparse(url).path else url + "/"
    
    urls = [
        "https://www.example.com",
        "https://www.example.com/",
        "https://www.example.com/banana"
    ]
    
    for url in urls:
        print(normalise(url))
    

    Output:

    https://www.example.com/
    https://www.example.com/
    https://www.example.com/banana