Search code examples
pythonurllibpython-reurl-parsing

How to get specific part of any url using urlparse()?


I have an url like this

url = 'https://grabagun.com/firearms/handguns/semi-automatic-handguns/glock-19-gen-5-polished-nickel-9mm-4-02-inch-barrel-15-rounds-exclusive.html'

When I use urlparse() function, I am getting result like this:

>>> url = urlparse(url) 
>>> url.path
'/firearms/handguns/semi-automatic-handguns/glock-19-gen-5-polished-nickel-9mm-4-02-inch-barrel-15-rounds-exclusive.html'

Is it possible to get something like this:

path1 = "firearms"
path2 = "handguns"
path3 = "semi-automatic-handguns"

and I don't want to get any text which have ".html" at the end.


Solution

  • You have some single / and some path have //...first replace all with same if you want apply directly on URL. For url.path you can do it directly

    url = '/firearms/handguns/semi-automatic-handguns/glock-19-gen-5-polished-nickel-9mm-4-02-inch-barrel-15-rounds-exclusive.html'
    
    url = url.split('/')
    url = list(filter(None, url))#remove empty elemnt
    url.pop()
    print(url)
    

    output list #

    ['firearms', 'handguns', 'semi-automatic-handguns']
    

    Part 2

    If you want to make them variables then simply iterate over them and create variables

    for n, val in enumerate(url):
        globals()["path%d"%n] = val
    
    print(path1)
    

    Output:

    handguns