In Python 3 and pandas I have a dataframe with full names. My default encoding is utf-8. The names are in the Portuguese language, therefore they have spelling accentuation
perfis_deputados.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513 entries, 0 to 512
Data columns (total 10 columns):
data_nascimento 513 non-null object
e_mail 513 non-null object
link_api 513 non-null object
link_foto 513 non-null object
nome_completo 513 non-null object
nome_eleitoral 513 non-null object
partido 513 non-null object
sexo 513 non-null object
telefone 513 non-null object
uf 513 non-null object
dtypes: object(10)
memory usage: 40.2+ KB
The columns "nome_completo" and "nome_eleitoral" have cases like:
AELTON JOSÉ DE FREITAS
JOÃO ALBERTO FRAGA SILVA
ALTINEU CÔRTES
I need to compare this dataframe with another - compare the names. But this second dataframe has names without any spelling accent. So the names appear like this, for example
AELTON JOSE DE FREITAS
JOAO ALBERTO FRAGA SILVA
ALTINEU CORTES
Please, is there a way to compare ignoring orthographic accenting? Or remove the spelling accent in the column I'm analyzing?
You can define and apply function to your DF like this :
import unidecode
def f(str):
return (unidecode.unidecode(str))
perfis_deputados["nome_completo"].apply(f)