I am making a web scraper.
I access google search, I get the link of the web page and then I get the contents of the <title>
tag.
The problem is that, for example, the string "P\xe1gina N\xe3o Encontrada!"
should be "Página Não Encontrada!"
.
I tried do decode to latin-1 and then encode to utf-8 and it did not work.
r2 = requests.get(item_str)
texto_pagina = r2.text
soup_item = BeautifulSoup(texto_pagina,"html.parser")
empresa = soup_item.find_all("title")
print(empresa_str.decode('latin1').encode('utf8'))
Can you help me, please? Thanks !
You can change the retrieved text variable to something like:
string = u'P\xe1gina N\xe3o Encontrada!'.encode('utf-8')
After printing string
it seemed to work just fine for me.
Edit
Instead of adding .encode('utf8')
, have you tried just using empresa_str.decode('latin1')
?
As in:
string = empresa_str.decode('latin_1')