Search code examples
javascriptpythonpython-3.xpython-2.7urllib2

Recursive decoding of URI component in python like javascript


I have a encoded URI component "http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2". I could able to convert this to "http://www.yelp.com/biz/carriage-house-café-houston-2" by applying decodeURIComponent function recursively as below

function recursiveDecodeURIComponent(uriComponent){
        try{
            var decodedURIComponent = decodeURIComponent(uriComponent);
            if(decodedURIComponent == uriComponent){
                return decodedURIComponent;
            }
            return recursiveDecodeURIComponent(decodedURIComponent);
        }catch(e){
            return uriComponent;
        }
    }
    console.log(recursiveDecodeURIComponent("http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"))

Outputs: "http://www.yelp.com/biz/carriage-house-café-houston-2".

I would like to get the same in python. I tried the following:

print urllib2.unquote(urllib2.unquote(urllib2.unquote("http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2").decode("utf-8")))

but I got http://www.yelp.com/biz/carriage-house-café-houston-2. Instead of Expected character é, I got 'é' irrespective of any number of calling urllib2.unquote.

I am using python2.7.3, can anyone help me?


Solution

  • I guess a simple loop should do the trick:

    uri = "http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"
    
    while True:
        dec = urllib2.unquote(uri)
        if dec == uri:
            break
        uri = dec
    
    uri = uri.decode('utf8')
    print '%r' % uri  
    # u'http://www.yelp.com/biz/carriage-house-caf\xe9-houston-2'