Search code examples
pythonpandasdataframeduplicatesdata-cleaning

How to move ALL duplicated rows into separate dataframe


My code is removing all duplicates using the drop_duplicates, keep=false.

The issue I'm having is that before I remove the duplicates I want to move all removed duplicates to a separate dataframe. I've come up with the below line of code, however I think its leaving one duplicate remaining and not removing ALL duplicates.

duplicates_df = combined_df.loc[combined_df.duplicated(subset='Unique_ID_Count'), :]

combined_df.drop_duplicates(subset='Unique_ID_Count', inplace=True, keep=False)

Do you have any ideas on how I can move all duplicates dropped in the second line of code to the duplicates_df dataframe?

Any help would be much appreciated, thanks!


Solution

  • Try this:

    duplicates_df = combined_df.loc[combined_df.duplicated(subset='Unique_ID_Count', keep=False)]
    combined_df   = combined_df.loc[~combined_df.duplicated(subset='Unique_ID_Count', keep=False)]