Search code examples
pythonjsonpython-3.xutf-8cyrillic

JSON printed to console shows wrong encoding


I am trying to read Cyrillic characters from some JSON file and then output it to console using Python 3.4.3 on Windows. Normal print('Russian smth буквы') works as intended.

But when I print JSON contents it seems to print in Windows-1251 - "СЂСѓСЃСЃРєРёРµ Р±СѓРєРІС‹" (though my console, my JSON file and my .py (with coding comment) are in UTF-8).

I've tried re-encoding it to Win-1251 and setting console to Win-1251, but still no luck.

My JSON (Encoded in UTF-8):

{
  "русские буквы": "что-то ещё на русском",
  "english letters": "и что-то на великом"
}

My code to load dictionary:

def load_dictionary():
global Dictionary, isFatal
try:
    with open(DictionaryName) as f:
        Dictionary = json.load(f)
except Exception as e:
    logging.critical('Error loading dictionary: ' + str(e))
    isFatal = True
    return
logging.info('Dictionary was loaded successfully')

I am trying to output it in 2 ways (both show the same gibberish):

print(helper.Dictionary.get('rly'))
print(helper.Dictionary)

An interesting add-on: I've added the whole Russian alphabet to my JSON file and it seems to get stuck at "С с" letter. (Error loading dictionary: 'charmap' codec can't decode byte 0x81 in position X: character maps to ). If I remove this one letter it shows no exception, but the problem above remains.


Solution

  • "But when I print JSON contents …"

    If you print it using type command, then you get mojibake СЂСѓСЃСЃРєРёРµ … instead of русские … under CHCP 1251 scope.

    Try type under CHCP 65001 (i.e. UTF-8) scope.

    Follow nauer's advice, use open(DictionaryName, encoding="utf8").

    Example (39755662.json is saved with UTF-8 encoding):

    ==> chcp 866
    Active code page: 866
    
    ==> type 39755662.json
    {
      "╤А╤Г╤Б╤Б╨║╨╕╨╡ ╨▒╤Г╨║╨▓╤Л": "╤З╤В╨╛-╤В╨╛ ╨╡╤Й╤С ╨╜╨░ ╤А╤Г╤Б╤Б╨║╨╛╨╝",
      "rly": "╤А╤Г╤Б╤Б╨║╨╕╨╣"
    }
    
    ==> chcp 1251
    Active code page: 1251
    
    ==> type 39755662.json
    {
      "русские буквы": "что-то ещё на русском",
      "rly": "СЂСѓСЃСЃРєРёР№"
    }
    
    ==> chcp 65001
    Active code page: 65001
    
    ==> type 39755662.json
    {
      "русские буквы": "что-то ещё на русском",
      "rly": "русский"
    }
    
    ==>