Search code examples
pythonsecurityurl

How to build a url with path components in Python securely


In Python, how do I build urls where some of the path components might be user input or otherwise untrusted?

Is there a way to use f-strings that automatically provides escaping, similar to Javascript's tagged template literals? Or maybe string.format?

I am looking for a pattern I can use multiple times on a larger project. I am looking for the "parameterized queries" of url building, as opposed to the "string concatenate and escape" approach.

Ideally, it would be some kind of url builder that also supports query parameters, or whatever else I might need down the road.

For example, maybe I have a url template like this:

url_template = "https://example.com/api/v1/user/{user_id}"

And I want to be able to take that url template and fill in a user_id value, but with any special characters escaped.

For example, if I had:

url_template = "https://example.com/api/v1/user/{user_id}"
user_id = "123/some-other-url?virus=veryyes"

final_url = build_it(url_template, { "user_id": user_id })
# final_url:
# https://example.com/api/v1/user/123%2Fsome-other-url%3Fvirus%3Dveryyes

I am aware of the function urllib.parse.quote but that seems too low level to use in practice.

I also see a lot of suggestions to use urllib.parse.urljoin but that seems like a bad idea, as it allows query parameters and multiple path segments.

The solution doesn't need to use a template like my example. I'd also be happy with a .append_single_path_segment() style api.


Solution

  • If you just want something that automatically URL-quotes its arguments, you can write a wrapper on urllib.parse.quote(), say like this:

    from urllib.parse import quote
    
    URLQUOTE_ARGS = {'safe': ''}
    
    def format_url(
        url_template: str,
        *args: str,
        **kwargs: str
    ) -> str:
        return url_template.format(
            *[quote(arg, **URLQUOTE_ARGS) for arg in args],
            **{k: quote(v, **URLQUOTE_ARGS) for k, v in kwargs.items()})
    

    Then:

    format_url(url_template, user_id=user_id)
    

    Note about typing: bytes might work equally well as str, but I don't want to go through the trouble of setting up a TypeVar for this example if it's not terribly relevant.