Python: Add a trailing slash to the URL but only if the URL doesn't end in a slash already or a file extension

I want, in Python, normalize a URL. My main purpose is to add slash / at the end of the URL if it is not already present but only if the URL doesn't end in a slash already or a file extension (so images, .php ,files pages, etc. aren't affected).

For example, if it is http://www.example.com then it should be converted to http://www.example.com/. But if it is http://www.example.com/image.png then it should not be affected.

To do this, I use this regular expression /([^/.]+)$. Regex demo

But it doesn't work in this python code, start_url is not modified

import re

start_url = "https://zonetuto.fr"
start_url = re.sub(r'/([^/.]+)$', r'/\1/', start_url)
print(start_url)

Solution

You could handle this by considering how a URL is constructed.

The part of the URL that follows the netloc is known as the path

For example:

https://www.example.com does not have a path

https://www.example.com/ has a path - i.e., just the /

https://www.example.com/banana has a path - i.e., /banana

Therefore you could utilise urllib.parse as follows:

from urllib.parse import urlparse

def normalise(url):
    return url if urlparse(url).path else url + "/"

urls = [
    "https://www.example.com",
    "https://www.example.com/",
    "https://www.example.com/banana"
]

for url in urls:
    print(normalise(url))

Output:

https://www.example.com/
https://www.example.com/
https://www.example.com/banana