Search code examples
pythonescaping

Decode escaped characters in URL


I have a list containing URLs with escaped characters in them. Those characters have been set by urllib2.urlopen when it recovers the html page:

http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=edit
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=history
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&variant=zh 

Is there a way to transform them back to their unescaped form in python?

P.S.: The URLs are encoded in utf-8


Solution

  • Using urllib package (import urllib) :

    Python 2.7

    From official documentation :

    urllib.unquote(string)

    Replace %xx escapes by their single-character equivalent.

    Example: unquote('/%7Econnolly/') yields '/~connolly/'.

    Python 3

    From official documentation :

    urllib.parse.unquote(string, encoding='utf-8', errors='replace')

    […]

    Example: unquote('/El%20Ni%C3%B1o/') yields '/El Niño/'.