Search code examples
python-unicode

Unicode conversion issue while using API gateway


The following URL works as expected and returns "null".

https://zga2tn1wgd.execute-api.us-east-1.amazonaws.com/mycall?url=https://mr.wikipedia.org/s/4jp4

But the same page, with unicode string instead of ascii string, throws an error:

"errorMessage": "'ascii' codec can't encode characters in position 10-20: ordinal not in range(128)", "errorType": "UnicodeEncodeError"

How do I encode the unicode characters while passing the string to API gateway?

https://zga2tn1wgd.execute-api.us-east-1.amazonaws.com/mycall?url=https://mr.wikipedia.org/wiki/%E0%A4%95%E0%A4%BF%E0%A4%B6%E0%A5%8B%E0%A4%B0%E0%A4%BE%E0%A4%B5%E0%A4%B8%E0%A5%8D%E0%A4%A5%E0%A4%BE


I am using following bookmarklet to generate the URL mentioned above...

javascript:(function(){location.href='https://z3nt6lcj40.execute-api.us-east-1.amazonaws.com/mycall?url='+encodeURIComponent(location.href);})();

Solution

  • There is this line in your lambda function that unquotes the URL

    url1 = urllib.parse.unquote(url)
    

    from

    'https://zga2tn1wgd.execute-api.us-east-1.amazonaws.com/mycall?url=https://mr.wikipedia.org/wiki/%E0%A4%95%E0%A4%BF%E0%A4%B6%E0%A5%8B%E0%A4%B0%E0%A4%BE%E0%A4%B5%E0%A4%B8%E0%A5%8D%E0%A4%A5%E0%A4%BE'
    

    to

    'https://zga2tn1wgd.execute-api.us-east-1.amazonaws.com/mycall?url=https://mr.wikipedia.org/wiki/किशोरावस्था'
    

    The non US-ASCII parts of the above results has to be encoded before performing the request. This is in the query component.

    It is recommended to separate URI into its components when encoding it to keep from changing its semantics.

    Here is some more things to do before making request to the URL.

    url1 = urllib.parse.unquote(url)
    urlparts = urllib.parse.urlparse(url1)
    querypart = urllib.parse.parse_qs(urlparts.query)
    querypart_enc = urllib.parse.urlencode(querypart)
    
    # Rebuild URL with escaped query part
    url1 = urllib.parse.urlunparse((
         urlparts.scheme, urlparts.netloc, 
         urlparts.path, urlparts.params,
         querypart_enc, urlparts.fragment
    ))