In Python, how do I build urls where some of the path components might be user input or otherwise untrusted?
Is there a way to use f-strings that automatically provides escaping, similar to Javascript's tagged template literals? Or maybe string.format
?
I am looking for a pattern I can use multiple times on a larger project. I am looking for the "parameterized queries" of url building, as opposed to the "string concatenate and escape" approach.
Ideally, it would be some kind of url builder that also supports query parameters, or whatever else I might need down the road.
For example, maybe I have a url template like this:
url_template = "https://example.com/api/v1/user/{user_id}"
And I want to be able to take that url template and fill in a user_id
value, but with any special characters escaped.
For example, if I had:
url_template = "https://example.com/api/v1/user/{user_id}"
user_id = "123/some-other-url?virus=veryyes"
final_url = build_it(url_template, { "user_id": user_id })
# final_url:
# https://example.com/api/v1/user/123%2Fsome-other-url%3Fvirus%3Dveryyes
I am aware of the function urllib.parse.quote
but that seems too low level to use in practice.
I also see a lot of suggestions to use urllib.parse.urljoin
but that seems like a bad idea, as it allows query parameters and multiple path segments.
The solution doesn't need to use a template like my example. I'd also be happy with a .append_single_path_segment()
style api.
If you just want something that automatically URL-quotes its arguments, you can write a wrapper on urllib.parse.quote()
, say like this:
from urllib.parse import quote
URLQUOTE_ARGS = {'safe': ''}
def format_url(
url_template: str,
*args: str,
**kwargs: str
) -> str:
return url_template.format(
*[quote(arg, **URLQUOTE_ARGS) for arg in args],
**{k: quote(v, **URLQUOTE_ARGS) for k, v in kwargs.items()})
Then:
format_url(url_template, user_id=user_id)
Note about typing: bytes
might work equally well as str
, but I don't want to go through the trouble of setting up a TypeVar
for this example if it's not terribly relevant.