Search code examples
pythonhtmlemojiemoticons

Convert HTML Entity to Python Emoji


Say I have the following HTML emoji entity: '&#x1f604 ;'

Note there isn't actually a space between the 4 and the ; it's just there so that it doesn't show up as a smiley

The emoji's Python form is: u"\U0001f604"

How do I convert all HTML emoji entities to their Python form?


Things I have tried so far:

  • Encode to utf-8
  • Unescape the text using HTML Parser and then convert
  • Use regex (couldn't get something that worked for all of the HTML emoji entities -- not as simple as swapping &#x with \U000 as that only works for some entities)

Solution

  • HTMLParser.unescape does just that:

    In [3]: HTMLParser.HTMLParser().unescape( '😄' )
    Out[3]: u'\U0001f604'