Is there a standard, preferably Pythonic, way to convert the &#xxxx;
notation to a proper unicode string?
For example,
מפגשי
Should be converted to:
מפגשי
It can be done - quite easily - using string manipulations, but I wonder if there's a standard library for this.
>>> from HTMLParser import HTMLParser
>>> h = HTMLParser()
>>> s = "מפגשי"
>>> print h.unescape(s)
מפגשי
It's part of the standard library, too.
However, if you're using Python 3, you have to import from html.parser
:
>>> from html.parser import HTMLParser
>>> h = HTMLParser()
>>> s = 'מפגשי'
>>> print(h.unescape(s))
מפגשי