I have a list containing URLs with escaped characters in them. Those characters have been set by urllib2.urlopen
when it recovers the html page:
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=edit
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=history
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&variant=zh
Is there a way to transform them back to their unescaped form in python?
P.S.: The URLs are encoded in utf-8
Using urllib
package (import urllib
) :
From official documentation :
urllib.unquote(string)
Replace
%xx
escapes by their single-character equivalent.Example:
unquote('/%7Econnolly/')
yields'/~connolly/'
.
From official documentation :
urllib.parse.unquote(string, encoding='utf-8', errors='replace')
[…]
Example:
unquote('/El%20Ni%C3%B1o/')
yields'/El Niño/'
.