I'm trying to open a URL that has a unicode character (é).
When I write it directly in the function I get this error:
from urllib.request import urlopen
uClient = urlopen("https://www.mypage.net/céline")
>>> UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 14: ordinal not in range(128)
When I write it like this it works:
from urllib.request import urlopen
uClient = urlopen("https://www.mypage.net/cr%C3%A9line")
But it should work in an automatic way, so with encode the output is like this:
without: https://www.mypage.net/c�line
utf-8: b'https://www.mypage.net/c\xc3\xa9line
latin-1: b'https://www.mypage.net/c\xe9line
ascii: b'https://www.mypage.net/cline
So the question is how do I convert the string "https://www.mypage.net/céline" into something that urlopen
function can use?
Im working with the ATOM-Editor and Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Thanks!
The request url must be properly url escaped to work with urlopen
.
In your example, this gives you a properly encoded url:
protohost = 'https://example.com/'
path = 'céline'
urllib.request.urlopen(f'{protohost}{urllib.request.quote(path)}')
Note here that the encoded portion looks like:
>>> f'{protohost}{urllib.request.quote(path)}'
'https://example.comc%C3%A9line'