Convert unicode series from a webpage to Chinese characters with Python

This is my first experience with unicode, and also with escaping and I'm over my head. The source is a website's pull-down menu and I want to generate a text list of all the items using Python.

From 新北&#x5E02 I understand that I need to make something that looks like u'\u65B0\u5317\u5E02' in order to see 新北市 when I print it.

However ''.join([s.replace('&#x', '\u') for s in ''.split(';')]) fails:

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

and ''.join([s.replace('&#x', '\\u') for s in '新北&#x5E02'.split(';')]) (double backslash) gives me '\\u65B0\\u5317\\u5E02'

Quesiton: What expression for mystring will make `print(mystring)' show '新北市'

Solution

Since what you're dealing with are really HTML entities, you can simply parse the input with html.unescape:

import html
print(html.unescape('&#x65B0;&#x5317;&#x5E02'))

This outputs:

新北市