I am currently working on different data frames that I should merge. One of my data frame has many duplicates on my key of merge variable, so I used drop.duplicate
to remove them.
Later checked the shape of my data frame before (it had 531 rows) and after (167 rows). So I supposed it worked!
But by using value.counts[key of merge]
, it doesn't return 1
for each entry of my key of merge variable. How could I explain this, and correct it?
For better understanding, here is my code :
df_stores.drop_duplicates(subset = 'Store ID', keep = 'first' )
df_stores['Store ID'].value_counts().sort_index(ascending=True)
Just so it is easilty accessible for others. I am writing the answer There are two ways:
1.
df_stores.drop_duplicates(subset = 'Store ID', keep = 'first', inplace= True)
Note: Do not use it everywhere as it throws warning in some cases
2.
df_stores = df_stores.drop_duplicates(subset = 'Store ID', keep = 'first')