I am trying to read Cyrillic characters from some JSON file and then output it to console using Python 3.4.3 on Windows. Normal print('Russian smth буквы') works as intended.
But when I print JSON contents it seems to print in Windows-1251 - "СЂСѓСЃСЃРєРёРµ Р±СѓРєРІС‹" (though my console, my JSON file and my .py (with coding comment) are in UTF-8).
I've tried re-encoding it to Win-1251 and setting console to Win-1251, but still no luck.
My JSON (Encoded in UTF-8):
{
"русские буквы": "что-то ещё на русском",
"english letters": "и что-то на великом"
}
My code to load dictionary:
def load_dictionary():
global Dictionary, isFatal
try:
with open(DictionaryName) as f:
Dictionary = json.load(f)
except Exception as e:
logging.critical('Error loading dictionary: ' + str(e))
isFatal = True
return
logging.info('Dictionary was loaded successfully')
I am trying to output it in 2 ways (both show the same gibberish):
print(helper.Dictionary.get('rly'))
print(helper.Dictionary)
An interesting add-on: I've added the whole Russian alphabet to my JSON file and it seems to get stuck at "С с" letter. (Error loading dictionary: 'charmap' codec can't decode byte 0x81 in position X: character maps to ). If I remove this one letter it shows no exception, but the problem above remains.
"But when I print JSON contents …"
If you print it using type
command, then you get mojibake СЂСѓСЃСЃРєРёРµ …
instead of русские …
under CHCP 1251
scope.
Try type
under CHCP 65001
(i.e. UTF-8
) scope.
Follow nauer's advice, use open(DictionaryName, encoding="utf8")
.
Example (39755662.json
is saved with UTF-8
encoding):
==> chcp 866
Active code page: 866
==> type 39755662.json
{
"╤А╤Г╤Б╤Б╨║╨╕╨╡ ╨▒╤Г╨║╨▓╤Л": "╤З╤В╨╛-╤В╨╛ ╨╡╤Й╤С ╨╜╨░ ╤А╤Г╤Б╤Б╨║╨╛╨╝",
"rly": "╤А╤Г╤Б╤Б╨║╨╕╨╣"
}
==> chcp 1251
Active code page: 1251
==> type 39755662.json
{
"русские буквы": "что-то ещё на русском",
"rly": "СЂСѓСЃСЃРєРёР№"
}
==> chcp 65001
Active code page: 65001
==> type 39755662.json
{
"русские буквы": "что-то ещё на русском",
"rly": "русский"
}
==>