In python 3 and pandas I need to eliminate duplicate rows from a dataframe by repeating values in a column. For this I used:
consolidado = df_processos.drop_duplicates(['numero_unico'], keep='last')
The column "numero_unico" has codes in string format like 0029126-45.2019.1.00.0000, 0026497-98.2019.1.00.0000, 0027274-83.2019.1.00.0000...
So the above command keeps only the last string code appearance found
Please does anyone know how to use drop_duplicates with one exception?
But the column contents will not always be string codes. In several lines appears the content "Sem número único"
And I want to keep all the lines where this exception exists. But with the above command the generated dataframe keeps only the last appearance of "Sem número único"
Example from my comment on the OP,
df = pandas.DataFrame({
'a': ['snu', 'snu', '002', '002', '003', '003'],
'b': [1, 2, 2, 1, 5, 6]
df_dedupe = pandas.concat([
df[df['a']!='snu'].drop_duplicates(['a'], keep='last')