Search code examples
pythonjsoncurlunicodespecial-characters

curl post request failing in the presence of special characters


Ok, I know there are too many questions on this topic already; reading every one of those hasn't helped me solve my problem.

I have " hello'© " on my webpage. My objective is to get this content as json, strip the "hello" and write back the remaining contents ,i.e, "'©" back on the page.

I am using a CURL POST request to write back to the webpage. My code for getting the json is as follows:

request = urllib2.Request("http://XXXXXXXX.json")
user = 'xxx'
base64string = base64.encodestring('%s:%s' % (xxx, xxx))
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib2.urlopen(request)   #send URL request
newjson = json.loads(result.read().decode('utf-8'))

At this point, my newres is unicode string. I discovered that my curl post request works only with percentage-encoding (like "%A3" for £).

What is the best way to do this? The code I wrote is as follows:

encode_dict = {'!':'%21',
               '"':'%22',
               '#':'%24',
               '$':'%25',
               '&':'%26',
               '*':'%2A',
               '+':'%2B',
               '@':'%40',
               '^':'%5E',
               '`':'%60',
               '©':'\xa9',
               '®':'%AE',
               '™':'%99',
               '£':'%A3'
              }
for letter in text1:
            print (letter)
            for keyz, valz in encode_dict.iteritems():
                if letter == keyz:
                    print(text1.replace(letter, valz))
                    path = "xxxx"
                    subprocess.Popen(['curl','-u', 'xxx:xxx', 'Content-Type: text/html','-X','POST','--data',"text="+text1, ""+path])

This code gives me an error saying " UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if letter == keyz:"

Is there a better way to do this?


Solution

  • The problem was with the encoding. json.loads() returns a stream of bytes and needs to be decoded to unicode, using the decode() fucntion. Then, I replaced all non-ascii characters by encoding the unicode with ascii encoding using encode('ascii','xmlcharrefreplace').

    newjson = json.loads(result.read().decode('utf-8').encode("ascii","xmlcharrefreplace"))
    

    Also, learning unicode basics helped me a great deal! This is an excellent tutorial.