Search code examples
pythonunicodeurllib2urlopen

How to request a url with non-unicode carachters on main domainname (not params) in Python?


I cannot request url "http://www.besondere-raumdüfte.de" with urllib2.urlopen().
I tried to encode string using urllib.urlencode with utf-8, idna, ascii But still doesn't work.
Raises URLError: <urlopen error unknown url type.


Solution

  • What you need is u"http://www.besondere-raumdüfte.de/".encode('idna'). Please note how the source string is a Unicode constant (the u prefix).

    The result is an URL usable with urlopen().

    If you have a domain name with non-ASCII characters and the rest of the URL contains non-ASCII characters, you need to .encode('idna') the domain part and iri2uri() the rest.