Search code examples
pythonencodingstandardsdata-conversion

Convert JIS X 208 code to UTF-8 in Python


Let's say I have this Kanji "亜" which is represented in JIS X 208 code in hex form: 0x3021. I want my Python program to convert that code into its UTF-8 form E4BA9C so that I can pass that string (URL-encoded) into my url like this

http://jisho.org/api/v1/search/words?keyword=%E4%BA%9C

I'm using Python 2.7.12 but I'm open to Python 3 solution as well


Solution

  • These are accessed under ISO 2022 codec.

    >>> '亜'.encode('iso2022_jp')
    b'\x1b$B0!\x1b(B'
    

    If I saw those bytes not framed by the escape sequence, I would have to know which version of JIS X 0208 is being used, but I'm entirely pattern matching on Wikipedia at this point anyway.

    >>> b = b'\033$B' + bytes.fromhex('3021')
    >>> c = b.decode('iso2022_jp')
    >>> c
    '亜'
    >>> urllib.parse.quote(c)
    '%E4%BA%9C'
    

    (This is Python 3.)