Search code examples

Generate a combination of URL endpoints

I have a URL as follows:

and I have an endpoint named /potato.

I would like to generate the following URLs from these:

My attempts so far involved splitting at slashes, and it misses the case when the endpoint itself begins with a / etc.

What's the cleanest and Pythonic way to accomplish this?


  • You can use a list comprehension:

    import re
    s = ''
    *path, _ = re.split('(?<=\w)/(?=\w)', s)
    results = [f'{"/".join(path[:2+i])}/potato' for i in range(len(path)-1)]


    ['', '', '']

    Edit: Python2.7 Solution:

    import re
    s = ''
    path = re.split('(?<=\w)/(?=\w)', s)[:-1]
    result = ['{}/potato'.format("/".join(path[:1+i])) for i in range(len(path))]


    ['', '', '', '']

    Another possibility to robustly and accurately parse the url is to use urllib.parse:

    import urllib.parse
    d = urllib.parse.urlsplit(s)
    _, *path, _ = d.path.split('/')
    result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]


    ['', '', '']

    In Python2.7 with urlparse:

    import urlparse
    d = urlparse.urlparse(s)
    path = d.path.split('/')[1:-1]
    result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i]))  for i in range(len(path))]


    ['', '', '']

    Edit 2: Timings:

    Source for timings can be found here

    enter image description here

    From the graph, it appears that in majority of cases, urlparse is slower that re.

    Edit 3: Generic solution:

    import re
    def generate_url_combos(s, endpoint):
       path = re.split('(?<=\w)/(?=\w)', re.sub('(?<=\w)/\w+\.\w+$|(?<=\w)/\w+\.\w+/+$', '', s).strip('/'))
       return ['{}/{}'.format("/".join(path[:1+i]), re.sub('^/|/+$', '', endpoint)) for i in range(len(path))]
    tests = [('', '/potato'), ('', '/potato'), ('', 'potato'), ('', 'potato/'), ('', 'potato'), ('', 'potato'), ('', 'potato'), ('', '/potato'), ('', '/potato')]
    for a, b in tests:
       print generate_url_combos(a, b)


    ['', '', '', '']
    ['', '', '', '']
    ['', '', '', '']
    ['', '', '', '']
    ['', '', '', '']

    Edit 4:

    import urlparse, re
    def generate_url_combos(s, endpoint):
       d = urlparse.urlparse(s)
       path = list(filter(None, d.path.split('/')))
       if not path:
         return '{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint))
       path = path[:-1] if re.findall('\.\w+$', path[-1]) else path
       return ['{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint) if not i else "/".join(path[:i])+'/'+re.sub('^/+|/+$', '', endpoint))  for i in range(len(path)+1)]
    tests = [('', '/potato'), ('', '/potato'), ('', 'potato'), ('', 'potato/'), ('', 'potato'), ('', 'potato'), ('', 'potato'), ('', '/potato'), ('', '/potato')]
    for a, b in tests:
       print generate_url_combos(a, b)


    ['', '', '', '']
    ['', '', '', '']
    ['', '', '', '']
    ['', '', '', '']
    ['', '', '', '']