Search code examples
pythonpandasdataframegroup-byduplicates

Grouping by multiple columns to find duplicate rows pandas


I have a df

id    val1     val2
 1     1.1      2.2
 1     1.1      2.2
 2     2.1      5.5
 3     8.8      6.2
 4     1.1      2.2
 5     8.8      6.2

I want to group by val1 and val2 and get similar dataframe only with rows which has multiple occurrence of same val1 and val2 combination.

Final df:

id    val1     val2
 1     1.1      2.2
 4     1.1      2.2
 3     8.8      6.2
 5     8.8      6.2

Solution

  • You need duplicated with parameter subset for specify columns for check with keep=False for all duplicates for mask and filter by boolean indexing:

    df = df[df.duplicated(subset=['val1','val2'], keep=False)]
    print (df)
       id  val1  val2
    0   1   1.1   2.2
    1   1   1.1   2.2
    3   3   8.8   6.2
    4   4   1.1   2.2
    5   5   8.8   6.2
    

    Detail:

    print (df.duplicated(subset=['val1','val2'], keep=False))
    0     True
    1     True
    2    False
    3     True
    4     True
    5     True
    dtype: bool