Search code examples
pythonunicodecloud9-idepython-unicode

Cloud9 Unicode Error - Import Sys Does Not Work


I am getting the following error when I run the code below when running my python code in Cloud9 IDE using the default version of Python (2.7.6):

import urllib
artistValue = "Sigur Rós"
artistValueUrl = urllib.quote(artistValue)

SyntaxError: Non-ASCII character '\xc3' in file /home/ubuntu/workspace/test.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

I read to adjust to the following code below was a work around.

import urllib
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
artistValue = "Sigur Rós"
artistValueUrl = urllib.quote(artistValue)

When I tried this a red x pop-up error that read:

Module 'sys' has no 'setdefaultencoding' member"

and if I run the code I still get the Syntax Error.

Why is this happening and what should I do?

EDIT: I also tried the following from the selected answer:

import urllib
print urllib.quote(u"Sigur Rós")

When I ran it I received the following error:

SyntaxError: Non-ASCII character '\xc3' in file /home/ubuntu/workspace/test.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details


Solution

  • Ok, that's a bit weird. The Python interpreter should give a SyntaxError complaining about the non-ASCII character in your source code if you don't declare an encoding at the start of the script; OTOH, if you have declared an encoding (or Cloud9 does it automatically), then the Python interpreter ought to treat it as a UTF-8 encoded string.

    I'm not familiar with Cloud9, so I can't guarantee that this will work, but it ought to. :)

    Make your string a Unicode string (by using the u string prefix) and then explicitly encode it to UTF-8:

    import urllib
    
    artistValue = u"Sigur Rós"
    artistValueUrl = urllib.quote(artistValue.encode('utf-8'))
    print artistValueUrl
    

    output

    Sigur%20R%C3%B3s
    

    edit

    What happens if you run this:

    # -*- coding: utf-8 -*-
    import urllib
    print urllib.quote("Sigur Rós")
    

    The following should work. Of course, this isn't a practical way to enter such strings into your script, I'm just trying to get a handle on what Cloud9 is doing.

    import urllib
    print urllib.quote("Sigur R\xc3\xb3s")
    

    And I guess you might as well also try this, just so we can see what error message it produces:

    import urllib
    print urllib.quote(u"Sigur Rós")