Search code examples
pythonunicodeurllibquotesshift-jis

how to open a URL with non utf-8 arguments


Using Python I need to transfer non utf-8 encoded data (specifically shift-jis) to a URL via the query string. How should I transfer the data? Quote it? Encode in utf-8?

Thanks


Solution

  • Query string parameters are byte-based. Whilst IRI-to-URI and typed non-ASCII characters will typically use UTF-8, there is nothing forcing you to send or receive your own parameters in that encoding.

    So for Shift-JIS (actually typically cp932, the Windows extension of that encoding):

    foo= u'\u65E5\u672C\u8A9E' # 日本語
    url= 'http://www.example.jp/something?foo='+urllib.quote(foo.encode('cp932'))
    

    In Python 3 you do it in the quote function itself:

    foo= '\u65E5\u672C\u8A9E'
    url= 'http://www.example.jp/something?foo='+urllib.parse.quote(foo, encoding= 'cp932')