Using Python 2.5.2 and Linux Debian, I'm trying to get the content from a Spanish URL that contains a Spanish char 'í'
:
import urllib
url = u'http://mydomain.es/índice.html'
content = urllib.urlopen(url).read()
I'm getting this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 8: ordinal not in range(128)
I've tried using before passing the url to urllib this:
url = urllib.quote(url)
and this:
url = url.encode('UTF-8')
but they didn't work.
Can you tell me what I am doing wrong ?
Per the applicable standard, RFC 1738, URLs can only contain ASCII characters. Good explanation here, and I quote:
"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."
As the URLs I've given explain, this probably means you'll have to replace that "lowercase i with acute accent" with `%ED'.