Search code examples
pythonurlurl-parsing

Python: URL parsing issue while adding a trailing slash


I was developing a small experiment in python to normalize a URL. My main purpose is to add slash / at the end of the URL if it is not already present. for example if it is http://www.example.com then it should be converted to http://www.example.com/

Here is a small snippet for the same:

if url[len(url)-1] != "/":
        url = url + "/"

But this also converts file names. For example http://www.example.com/image.png into http://www.example.com/image.png/ which is wrong. I just want to add slash to directory and not file names. How do i do this?

Thanks in advance!


Solution

  • You could pattern match on the last substring to check for known domains vs file extensions. It's not too difficult to enumerate at least the basic top level domains like .com, .gov, .org, etc.

    If you are familiar with regular extensions, you can match on a pattern like '.com$'.

    Otherwise, you can split by '.' and check the last substring you get:

    In [32]: url_png = 'http://www.example.com/image.png'
    
    In [33]: url_com = 'http://www.example.com'
    
    In [34]: domains = ['com', 'org', 'gov']
    
    In [35]: for url in [url_png, url_com]:
       ....:     suffix = url.split('.')[-1]
       ....:     if suffix in domains:
       ....:         print url
       ....:
    http://www.example.com
    

    As a side note and as you see in the above example, you don't need to do url[len(url)-1] to index the last element of a list; the Pythonic way is just url[-1].