In Python how to encode/decode unicode characters such as ö

Using Python 2.6.6 on CentOS 6.4

import json
import urllib2    

url = ''
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
opener.addheaders = [('Accept-Charset', 'utf-8')]
response =
page =
print page


...<suggestion data="how to pronounce eyjafjallaj

at which Python dies with no error message.

I think it dies because the next character is ö:

<suggestion data="how to pronounce edinburgh"/>
<suggestion data="how to pronounce elle"/>
<suggestion data="how to pronounce edith"/>
<suggestion data="how to pronounce et al"/>
<suggestion data="how to pronounce eunice"/>
<suggestion data="how to pronounce english names"/>
<suggestion data="how to pronounce edamame"/>
<suggestion data="how to pronounce erudite"/>
<suggestion data="how to pronounce eyjafjallajökull"/>
<suggestion data="how to pronounce either"/>

This appears to be a unicode issue, I have tried encode('utf-8') and decode('utf-8') in many ways, but it still dies. Any ideas?

PS It seems I need to stay with urllib2 not urllib as urllib ignores cookies that causes other problems.


  • returns a bytestring. Python shouldn't die while printing a bytestring because no character conversion occurs, bytes are printed as is.

    You could try to print Unicode instead:

    text = page.decode('charset') or 'utf-8')
    print text