Search code examples
pythonpandasduplicatescomments

How to filter only duplicate comments


Hi :) there are two columns: sentiment and comment. How to filter only duplicate comments in the dataset? Thank you four your help :)


Solution

  • It depends on the columns using which you would like to output only duplicate records.

    Example 1 - based on all columns in a data frame called df

    duplicates = df[df.duplicated(keep=False)] #False means retaining all duplicates
    

    Example 2 - based on a certain column or columns

    duplicate = dictionary_df[dictionary_df[0].duplicated(keep=False)]#This is on the first column