Search code examples
pythonpandasfunctiondataframereplicate

Pandas drop.duplicate function doesn't work as expected


I am currently working on different data frames that I should merge. One of my data frame has many duplicates on my key of merge variable, so I used drop.duplicate to remove them. Later checked the shape of my data frame before (it had 531 rows) and after (167 rows). So I supposed it worked!
But by using value.counts[key of merge], it doesn't return 1 for each entry of my key of merge variable. How could I explain this, and correct it?

For better understanding, here is my code :

df_stores.drop_duplicates(subset = 'Store ID', keep = 'first' )

df_stores['Store ID'].value_counts().sort_index(ascending=True)

Solution

  • Just so it is easilty accessible for others. I am writing the answer There are two ways:

    1. df_stores.drop_duplicates(subset = 'Store ID', keep = 'first', inplace= True)

    Note: Do not use it everywhere as it throws warning in some cases

    2. df_stores = df_stores.drop_duplicates(subset = 'Store ID', keep = 'first')