Search code examples
python-3.xweb-scrapingunicodeencode

How to convert the hexa code to text in python


I have this string which I get from web scrapping. I want to convert this hex code to normal text. I use encode("utf-8")still it is not working

text = 'Hospital Nossa Senhora da Conceição, Porto Alegre, Brazil,Hospital de Base São José do Rio Preto, São José Do Rio Preto, Brazil'
text = text.encode("ut-8") 

The expected output must be Hospital Nossa Senhora da Conceição, Porto Alegre, Brazil, Hospital de Base São José do Rio Preto, São José Do Rio Preto

I also tried

text.encode('utf-8').decode('unicode-escape')

but still it is not working. Could anyone help in this?


Solution

  • Apply html — HyperText Markup Language support.

    This module defines utilities to manipulate HTML.

    html.unescape(s)

    Convert all named and numeric character references (e.g. >, >, >) in the string s to the corresponding Unicode characters. This function uses the rules defined by the HTML 5 standard for both valid and invalid character references, and the list of HTML 5 named character references.

    New in version 3.4.

    import html
    
    text = 'Hospital Nossa Senhora da Conceição, Porto Alegre, Brazil,Hospital de Base São José do Rio Preto, São José Do Rio Preto, Brazil'
    unescaped_text = html.unescape(text)
    print( unescaped_text)
    

    Output: .\SO\72657237.py

    Hospital Nossa Senhora da Conceição, Porto Alegre, Brazil,Hospital de Base São José do Rio Preto, São José Do Rio Preto, Brazil