I have a encoded URI component "http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"
. I could able to convert this to "http://www.yelp.com/biz/carriage-house-café-houston-2"
by applying decodeURIComponent function recursively as below
function recursiveDecodeURIComponent(uriComponent){
try{
var decodedURIComponent = decodeURIComponent(uriComponent);
if(decodedURIComponent == uriComponent){
return decodedURIComponent;
}
return recursiveDecodeURIComponent(decodedURIComponent);
}catch(e){
return uriComponent;
}
}
console.log(recursiveDecodeURIComponent("http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"))
Outputs: "http://www.yelp.com/biz/carriage-house-café-houston-2"
.
I would like to get the same in python. I tried the following:
print urllib2.unquote(urllib2.unquote(urllib2.unquote("http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2").decode("utf-8")))
but I got http://www.yelp.com/biz/carriage-house-café-houston-2
. Instead of Expected character é
, I got 'é'
irrespective of any number of calling urllib2.unquote.
I am using python2.7.3, can anyone help me?
I guess a simple loop should do the trick:
uri = "http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"
while True:
dec = urllib2.unquote(uri)
if dec == uri:
break
uri = dec
uri = uri.decode('utf8')
print '%r' % uri
# u'http://www.yelp.com/biz/carriage-house-caf\xe9-houston-2'