I have a URL as follows:
http://example.com/foo/bar/baz/file.php
and I have an endpoint named /potato
.
I would like to generate the following URLs from these:
http://example.com/foo/potato
http://example.com/foo/bar/potato
http://example.com/foo/bar/baz/potato
My attempts so far involved splitting at slashes, and it misses the case when the endpoint itself begins with a /
etc.
What's the cleanest and Pythonic way to accomplish this?
You can use a list comprehension:
import re
s = 'http://example.com/foo/bar/baz/file.php'
*path, _ = re.split('(?<=\w)/(?=\w)', s)
results = [f'{"/".join(path[:2+i])}/potato' for i in range(len(path)-1)]
Output:
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
Edit: Python2.7 Solution:
import re
s = 'http://example.com/foo/bar/baz/file.php'
path = re.split('(?<=\w)/(?=\w)', s)[:-1]
result = ['{}/potato'.format("/".join(path[:1+i])) for i in range(len(path))]
Output:
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
Another possibility to robustly and accurately parse the url is to use urllib.parse
:
import urllib.parse
d = urllib.parse.urlsplit(s)
_, *path, _ = d.path.split('/')
result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]
Output:
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
In Python2.7 with urlparse
:
import urlparse
d = urlparse.urlparse(s)
path = d.path.split('/')[1:-1]
result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i])) for i in range(len(path))]
Output:
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
Edit 2: Timings:
Source for timings can be found here
From the graph, it appears that in majority of cases, urlparse
is slower that re
.
Edit 3: Generic solution:
import re
def generate_url_combos(s, endpoint):
path = re.split('(?<=\w)/(?=\w)', re.sub('(?<=\w)/\w+\.\w+$|(?<=\w)/\w+\.\w+/+$', '', s).strip('/'))
return ['{}/{}'.format("/".join(path[:1+i]), re.sub('^/|/+$', '', endpoint)) for i in range(len(path))]
tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
print generate_url_combos(a, b)
Output:
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
Edit 4:
import urlparse, re
def generate_url_combos(s, endpoint):
d = urlparse.urlparse(s)
path = list(filter(None, d.path.split('/')))
if not path:
return '{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint))
path = path[:-1] if re.findall('\.\w+$', path[-1]) else path
return ['{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint) if not i else "/".join(path[:i])+'/'+re.sub('^/+|/+$', '', endpoint)) for i in range(len(path)+1)]
tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
print generate_url_combos(a, b)
Output:
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']