Search code examples
pythonunicodepython-unicode

printing and writing Unicode characters in Python


I am trying to get some Unicode characters printed out or written to a text file and am running into Errors..please advice, trying to google gave me a few hints but that error ed too..below is my code..What might I be doing wrong here..

I am trying to eventually use 'requests' and parse JSON with data that has Unicode values..

I am trying to parse JSON using requests from this url

https://api.discogs.com/releases/7828220

try:
        import requests
import json
url = 'https://api.discogs.com/releases/7828220'
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0' }
art = requests.get(url, headers=headers)
json_object = json.loads(art.text)
try:
    print str(json_object['companies'][0][name])
except:
    print "Genre list isn't defined"

    {u'name': u'\u041e\u041e\u041e "\u041f\u0430\u0440\u0430\u0434\u0438\u0437"', u'entity_type': u'10', u'catno': u'PARAD-432', u'resource_url': u'https://api.discogs.com/labels/210403', u'id': 210403, u'entity_type_name': u'Manufactured By'}

Here json_object['companies'][0][name] has a few Unicode characters that wont display on the command line terminal and also wont write to a file with the required output (Unicode)

Actual output looks like "ООО "Парадиз"", 

how can I get python to interpret these values as it appears?


Solution

  • won't display on the command line terminal

    What errors do you get? In any event, the following works if you remove the unnecessary str() conversion and quote 'name' on a terminal that supports UTF-8, such as Linux:

    import requests
    import json
    
    url = 'https://api.discogs.com/releases/7828220'
    headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0' }
    art = requests.get(url, headers=headers)
    json_object = json.loads(art.text)
    print json_object['companies'][0]['name']
    

    Output:

    ООО "Парадиз"
    

    On Windows, the command console may not default to an encoding that supports the characters you are trying to print. One easy way is to switch to a supported encoding, in this case chcp 1251 changes the code page to one supporting Russian, and will make the above work.

    to write it to a file, use io.open with an encoding:

    import io
    with io.open('output.txt','w',encoding='utf8') as f:
        f.write(json_object['companies'][0]['name'])