Python: How to only URL Encode a specific URL Parameter?

I have some big URLs that contain a lot of URL parameters.

For my specific case, I need to URL Encode the content of one specific URL Parameter (q) when the content after the "q=" starts with a slash ("/")

Example URL:

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"

How can I only URL encode that last part of the URL which is within the "q" parameter?

The output of this example should be:

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22%20

I already tried some different things with urllib.parse but it doesnt work the way I want it.

Thanks for your help!

Solution

split the string on the &q=/ part and only encode the last string

from urllib import parse

url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
encoded = parse.quote_plus(url.split("&q=/")[1])
encoded_url = f"{url.split('&q=/')[0]}&q=/{encoded}"
print(encoded_url)

output

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22

Note that there's a difference between this and the requested output, but you have an url encoded space (%20) at the end

EDIT

Comment shows a different need for the encoding, so the code needs to change a bit. The code below only encodes the part after &q=. Basically, first split the url and the parameters, then iterate through the parameters to find the q= parameter, and encode that part. Do some f-string and join magic and you get an url that has the q parameter encoded. Note that this might have issues if an & is present in the part that needs to be encoded.

url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
newparameters = []
for parameter in parameters.split("&"):
    # check if the parameter is the part that needs to be encoded
    if parameter.startswith("q="):
        # encode the parameter
        newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
    else:
        # otherwise add the parameter unencoded
        newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)

output

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22&utm_source=test1&cpc=123&gclid=abc123

EDIT 2

Trying to solve the edge case where there's a & character in the string to be encoded, as this messes up the string.split("&").
I tried using urllib.parse.parse_qs() but this has the same issue with the & character. Docs for reference.

This question is a nice example of how edge cases can mess up simple logic and make it overly complicated.

The RFC3986 also didn't specify any limitations on the name of the query string, otherwise that could've been used to narrow down possible errors even more.

updated code

from urllib import parse


url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/&"TE&eeST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")

# addition to handle & in the querystring.
# it reduces errors, but it can still mess up if there's a = in the part to be encoded.
split_parameters = []
for index, parameter in enumerate(parameters.split("&")):
    if "=" not in parameter:
        # add this part to the previous entry in split_parameters
        split_parameters[-1] += f"&{parameter}"
    else:
        split_parameters.append(parameter)


newparameters = []
for parameter in split_parameters:
    # check if the parameter is the part that needs to be encoded
    if parameter.startswith("q="):
        # encode the parameter
        newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
    else:
        # otherwise add the parameter unencoded
        newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)

output

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%26%22TE%26eeST%22&utm_source=test1&cpc=123&gclid=abc123