Even tho I have #coding=utf-8
at the top of my .py document and covnvert cyrillica strings to utf-8 before passing them to the console, it still gives me:
File "C:\Python27\lib\encodings\cp1252.py", line 15, in decode return codecs.charmap_decode(input,errors,decoding_table) UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 16: character maps to
What else can I do?
This is my to_utf8
function:
def to_utf8(obj):
if isinstance(obj, dict):
return dict([(to_utf8(key), to_utf8(value)) for key, value in obj.iteritems()])
elif isinstance(obj, list):
return [to_utf8(element) for element in obj]
elif isinstance(obj, unicode):
return obj.encode('utf-8')
else:
return obj
You are going the wrong way: Obviously the bytes in your str
are utf8
. However, python does not care what is in a str
(a sequence of UTF-8-encoded unicode codepoints is just another sequence of bytes from pythons viewpoint).
This remains to be answered: For reasons I don't know it tries to decode to cp1252
.
If you spoon-feed utf8
to python, it works. Equally, if you explicitly prefix the u
literal, Python does know what is in the character sequence (it is a unicode
type now). str != unicode != utf8
.
# -*- coding: utf-8 -*-
import wx
# works
mystr= "СТАЛИНГРАД".decode('utf8')
# this also works
mystr= u"СТАЛИНГРАД"
# uncomment to make code fail
#mystr= "СТАЛИНГРАД"
app = wx.App(0)
frm = wx.Frame(None, -1, mystr)
frm.Show()
app.MainLoop())
wxPython 3.0 is unicode only and accepts utf-8 AND unicode.