Search code examples
pythonutf-8character-encodingspecial-charactersbeautifulsoup

How to unescape special characters from BeautifulSoup output?


I am facing issues with the special characters like ° and ® which represent the degree Fahrenheit sign and the registered sign,

when i print the string the contains the special characters, it gives output like this:

Preheat oven to 350° F
Welcome to Lorem Ipsum Inc® 

Is there a way I can output the exact characters and not their codes? Please let me know.


Solution

  • $ python -c'from BeautifulSoup import BeautifulSoup
    > print BeautifulSoup("""<html>Preheat oven to 350&deg; F
    > Welcome to Lorem Ipsum Inc&reg;""",
    > convertEntities=BeautifulSoup.HTML_ENTITIES).contents[0].string'
    Preheat oven to 350° F
    Welcome to Lorem Ipsum Inc®