Search code examples
pythonpandasdataframeexceptiondrop-duplicates

In pandas how to use drop_duplicates with one exception?


In python 3 and pandas I need to eliminate duplicate rows from a dataframe by repeating values in a column. For this I used:

consolidado = df_processos.drop_duplicates(['numero_unico'], keep='last')

The column "numero_unico" has codes in string format like 0029126-45.2019.1.00.0000, 0026497-98.2019.1.00.0000, 0027274-83.2019.1.00.0000...

So the above command keeps only the last string code appearance found

Please does anyone know how to use drop_duplicates with one exception?

But the column contents will not always be string codes. In several lines appears the content "Sem número único"

And I want to keep all the lines where this exception exists. But with the above command the generated dataframe keeps only the last appearance of "Sem número único"


Solution

  • Example from my comment on the OP,

    df = pandas.DataFrame({
        'a': ['snu', 'snu', '002', '002', '003', '003'], 
        'b': [1, 2, 2, 1, 5, 6]
    })
    df_dedupe = pandas.concat([ 
        df[df['a']=='snu'], 
        df[df['a']!='snu'].drop_duplicates(['a'], keep='last') 
    ])