Search code examples
pythonhttputf-8python-requestsurlencode

python requests module request params encoded url is different with the intended url


I'm having a problem with url encoding on a python project requests module.

These are the two different url encoded params that I got from wireshark packet

  1. 0900+%28%EB%8C%80%ED%95%9C%EB%AF%BC%EA%B5%AD+%ED%91%9C%EC%A4%80%EC%8B%9C%29
  2. 0900%20(%EB%8C%80%ED%95%9C%EB%AF%BC%EA%B5%AD%20%ED%91%9C%EC%A4%80%EC%8B%9C)

'1' is the python requests module encoded url and '2' is the url from web browser sent packet. When I decode both of them, it shows the same utf-8 text.

Seems like handling of blank space and parentheses is different between them. Is there a way I can change '1' to '2'?

Here are the codes I used to send request

_url = "http://something"
_headers = {
    'Accept': 'text/javascript',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'ko-KR',
    'Connection': 'keep-alive',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest'
}
_params = {
    'action': 'log',
    'datetime': '0900 (대한민국 표준시)'
    }

# This is the request part
session = requests.Session()
res = session.get(_url, headers=_headers, params=_params)

Solution

  • You can manually encode your _params to construct your query string and then concatenate it to your _url.

    You can use urllib.parse.urlencode[Python-Docs] to convert your _params dictionary to a percent-encoded ASCII text string. The resulting string is a series of key=value pairs separated by & characters, where both key and value are quoted using the quote_via function. By default, quote_plus() is used to quote the values, which means spaces are quoted as a + character and / characters are encoded as %2F, which follows the standard for GET requests (application/x-www-form-urlencoded). An alternate function that can be passed as quote_via is quote(), which will encode spaces as %20 and not encode / characters. For maximum control of what is quoted, use quote and specify a value for safe.

    
    from urllib.parse import quote_plus, quote, urlencode
    import requests
    
    url_template = "http://something/?{}"
    _headers = { ... }
    _params = {"action": "log", "datetime": "0900 (대한민국 표준시)"}
    _url = url_template.format(urlencode(_params, safe="()", quote_via=quote))
    
    response = requests.get(_url, headers=_headers)