I have some big URLs that contain a lot of URL parameters.
For my specific case, I need to URL Encode the content of one specific URL Parameter (q) when the content after the "q=" starts with a slash ("/")
Example URL:
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"
How can I only URL encode that last part of the URL which is within the "q" parameter?
The output of this example should be:
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22%20
I already tried some different things with urllib.parse but it doesnt work the way I want it.
Thanks for your help!
split the string on the &q=/
part and only encode the last string
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
encoded = parse.quote_plus(url.split("&q=/")[1])
encoded_url = f"{url.split('&q=/')[0]}&q=/{encoded}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22
Note that there's a difference between this and the requested output, but you have an url encoded space (%20
) at the end
EDIT
Comment shows a different need for the encoding, so the code needs to change a bit. The code below only encodes the part after &q=
. Basically, first split the url and the parameters, then iterate through the parameters to find the q=
parameter, and encode that part. Do some f-string and join magic and you get an url that has the q
parameter encoded. Note that this might have issues if an &
is present in the part that needs to be encoded.
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
newparameters = []
for parameter in parameters.split("&"):
# check if the parameter is the part that needs to be encoded
if parameter.startswith("q="):
# encode the parameter
newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
else:
# otherwise add the parameter unencoded
newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22&utm_source=test1&cpc=123&gclid=abc123
EDIT 2
Trying to solve the edge case where there's a &
character in the string to be encoded, as this messes up the string.split("&")
.
I tried using urllib.parse.parse_qs()
but this has the same issue with the &
character. Docs for reference.
This question is a nice example of how edge cases can mess up simple logic and make it overly complicated.
The RFC3986 also didn't specify any limitations on the name of the query string, otherwise that could've been used to narrow down possible errors even more.
updated code
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/&"TE&eeST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
# addition to handle & in the querystring.
# it reduces errors, but it can still mess up if there's a = in the part to be encoded.
split_parameters = []
for index, parameter in enumerate(parameters.split("&")):
if "=" not in parameter:
# add this part to the previous entry in split_parameters
split_parameters[-1] += f"&{parameter}"
else:
split_parameters.append(parameter)
newparameters = []
for parameter in split_parameters:
# check if the parameter is the part that needs to be encoded
if parameter.startswith("q="):
# encode the parameter
newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
else:
# otherwise add the parameter unencoded
newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%26%22TE%26eeST%22&utm_source=test1&cpc=123&gclid=abc123