Search code examples
pythonutf-8pycurl

PYCURL get a json file with utf-8 encoding problems


i'm facing a problem with my PYCURL request. My json file on the server is encoded in utf-8 and look like this :

{
  "address" : "123 rue de Labège"
}

I use PYCURL to get this json and copy it into a new file on my computer. I use Python 2.7 and here is my setup for PYCURL :

def setup(self, _url, _method, _login, _passwd, _path, *args, **kwargs):
    self.curl = pycurl.Curl()
    self.url = 'https://%s:%d/' % (self.ip, self.port) + _url
    self.method = _method
    self.userpwd = '%s:%s' % (_login, _passwd)
    self.path = _path

    self.curl.setopt(pycurl.URL, self.url)

    curl_method = {
        "GET": pycurl.HTTPGET,
        "POST": pycurl.POST
    }

    if self.method in curl_method:
        self.curl.setopt(curl_method[self.method], 1)
    else:
        self.curl.setopt(pycurl.CUSTOMREQUEST, self.method)

    self.curl.setopt(pycurl.SSL_VERIFYPEER, 0)
    self.curl.setopt(pycurl.SSL_VERIFYHOST, 0)
    self.curl.setopt(pycurl.HTTPAUTH, pycurl.HTTPAUTH_BASIC)
    self.curl.setopt(pycurl.USERPWD, self.userpwd)

    if _url == 'MY_FILE_JSON':
        filename = 'file.json'
        self.file = open(self.path + filename, 'wb')
        self.curl.setopt(pycurl.WRITEDATA, self.file)

The problem is in the file i'm getting :

{
  "address" : "123 rue de Lab\u00e8uge"
}

I don't understand why PYCURL encoding my "è" into \u00e8. Is there any option with setopt with PYCURL to force it to print the good character ?


Solution

  • Actually this is totally correct, once you do a print of the variable property, you can see it prints out fine.

    This is just how Python internally handles Unicode strings. Once PycURL receives the file it will be converted to whatever type is appropriate for the property. In your case this is a Unicode string.

    Check this article out for more information.

    So to recap, if you do:

    >>> test = u'123 rue de Lab\u00e8uge'
    >>> print(test)
    123 rue de Labèuge
    

    Here you can see I create a Unicode string (starting with the u).