Search code examples
pythonunicodepython-3.6urlopen

UnicodeEncodeError with urlopen(..net/cé..)


I'm trying to open a URL that has a unicode character (é).
When I write it directly in the function I get this error:

from urllib.request import urlopen
uClient = urlopen("https://www.mypage.net/céline")

>>> UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 14: ordinal not in range(128)

When I write it like this it works:

from urllib.request import urlopen
uClient = urlopen("https://www.mypage.net/cr%C3%A9line")

But it should work in an automatic way, so with encode the output is like this:

without: https://www.mypage.net/c�line

utf-8: b'https://www.mypage.net/c\xc3\xa9line

latin-1: b'https://www.mypage.net/c\xe9line

ascii: b'https://www.mypage.net/cline

So the question is how do I convert the string "https://www.mypage.net/céline" into something that urlopen function can use?

Im working with the ATOM-Editor and Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32

Thanks!


Solution

  • The request url must be properly url escaped to work with urlopen.

    In your example, this gives you a properly encoded url:

    protohost = 'https://example.com/'
    path = 'céline'
    urllib.request.urlopen(f'{protohost}{urllib.request.quote(path)}')
    

    Note here that the encoded portion looks like:

    >>> f'{protohost}{urllib.request.quote(path)}'
    'https://example.comc%C3%A9line'