In my program I get shift-jis character codes as Python integers which I need to convert to their corresponding utf8 character codes (which should also be in integers). How can I do that? For ASCII you have the helpful functions ord()/chr() which allows you to convert an integer into an ASCII string which you can easily convert to unicode later. I can't find anything like that for other encodings.
Using Python 2.
EDIT: the final code. Thanks everyone:
def shift_jis2unicode(charcode): # charcode is an integer
if charcode <= 0xFF:
string = chr(charcode)
else:
string = chr(charcode >> 8) + chr(charcode & 0xFF)
return ord(string.decode('shift-jis'))
print shift_jis2unicode(8140)
There's no such thing as "utf8 character codes (which should also be in integers)".
Unicode defines "code points", which are integers. UTF-8 defines how to convert those code points to an array of bytes.
So I think you want the Unicode code points. In that case:
def shift_jis2unicode(charcode): # charcode is an integer
if charcode <= 0xFF:
shift_jis_string = chr(charcode)
else:
shift_jis_string = chr(charcode >> 8) + chr(charcode & 0xFF)
unicode_string = shift_jis_string.decode('shift-jis')
assert len(unicode_string) == 1
return ord(unicode_string)
print "U+%04X" % shift_jis2unicode(0x8144)
print "U+%04X" % shift_jis2unicode(0x51)
(Also: I don't think 8100 is a valid shift-JIS character code...)