Search code examples
pythonurl

Python: Get URL path sections


How do I get specific path sections from a url? For example, I want a function which operates on this:

http://www.mydomain.com/hithere?image=2934

and returns "hithere"

or operates on this:

http://www.mydomain.com/hithere/something/else

and returns the same thing ("hithere")

I know this will probably use urllib or urllib2 but I can't figure out from the docs how to get only a section of the path.


Solution

  • Extract the path component of the URL with urlparse (Python 2.7):

    import urlparse
    path = urlparse.urlparse('http://www.example.com/hithere/something/else').path
    print path
    > '/hithere/something/else'
    

    or urllib.parse (Python 3):

    import urllib.parse
    path = urllib.parse.urlparse('http://www.example.com/hithere/something/else').path
    print(path)
    > '/hithere/something/else'
    

    Split the path into components with os.path.split:

    >>> import os.path
    >>> os.path.split(path)
    ('/hithere/something', 'else')
    

    The dirname and basename functions give you the two pieces of the split; perhaps use dirname in a while loop:

    >>> while os.path.dirname(path) != '/':
    ...     path = os.path.dirname(path)
    ... 
    >>> path
    '/hithere'