Search code examples
pythonpandasspelling

How to compare names with and without orthographic accent in pandas?


In Python 3 and pandas I have a dataframe with full names. My default encoding is utf-8. The names are in the Portuguese language, therefore they have spelling accentuation

perfis_deputados.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513 entries, 0 to 512
Data columns (total 10 columns):
data_nascimento    513 non-null object
e_mail             513 non-null object
link_api           513 non-null object
link_foto          513 non-null object
nome_completo      513 non-null object
nome_eleitoral     513 non-null object
partido            513 non-null object
sexo               513 non-null object
telefone           513 non-null object
uf                 513 non-null object
dtypes: object(10)
memory usage: 40.2+ KB

The columns "nome_completo" and "nome_eleitoral" have cases like:

AELTON JOSÉ DE FREITAS
JOÃO ALBERTO FRAGA SILVA
ALTINEU CÔRTES

I need to compare this dataframe with another - compare the names. But this second dataframe has names without any spelling accent. So the names appear like this, for example

AELTON JOSE DE FREITAS
JOAO ALBERTO FRAGA SILVA
ALTINEU CORTES

Please, is there a way to compare ignoring orthographic accenting? Or remove the spelling accent in the column I'm analyzing?


Solution

  • You can define and apply function to your DF like this :

    import unidecode
    def f(str):
        return (unidecode.unidecode(str))
    
    perfis_deputados["nome_completo"].apply(f)