Search code examples
pythonutf-8latin1

Python and string accents


I am making a web scraper.
I access google search, I get the link of the web page and then I get the contents of the <title> tag.
The problem is that, for example, the string "P\xe1gina N\xe3o Encontrada!" should be "Página Não Encontrada!". I tried do decode to latin-1 and then encode to utf-8 and it did not work.

    r2 = requests.get(item_str)
    texto_pagina = r2.text
    soup_item = BeautifulSoup(texto_pagina,"html.parser")
    empresa = soup_item.find_all("title")
    print(empresa_str.decode('latin1').encode('utf8'))

Can you help me, please? Thanks !


Solution

  • You can change the retrieved text variable to something like:

    string = u'P\xe1gina N\xe3o Encontrada!'.encode('utf-8')
    

    After printing string it seemed to work just fine for me.


    Edit

    Instead of adding .encode('utf8'), have you tried just using empresa_str.decode('latin1')?

    As in:

    string = empresa_str.decode('latin_1')