Search code examples
pythonpandasrowdata-sciencedrop-duplicates

How to drop duplicate rows based on values of two columns?


I have a data frame like this:

Category Date_1       Score_1    Date_2           Score_2
  A      13/11/2019    5        13/11/2019        10
  A      13/11/2019    5        14/11/2019        55
  A      13/11/2019    5        15/11/2019        45
  A      13/11/2019    5        16/11/2019        80
  A      14/11/2019    3        13/11/2019        10
  A      14/11/2019    3        14/11/2019        55
  A      14/11/2019    3        15/11/2019        45
  A      14/11/2019    3        16/11/2019        80
  A      15/11/2019    7        13/11/2019        10
  A      15/11/2019    7        14/11/2019        55
  A      15/11/2019    7        15/11/2019        45
  A      15/11/2019    7        16/11/2019        80
  B      13/11/2019    4        13/11/2019        18
  B      13/11/2019    4        14/11/2019        65
  B      13/11/2019    4        15/11/2019        75
  B      13/11/2019    4        16/11/2019        89
  B      14/11/2019    9        13/11/2019        18
  B      14/11/2019    9        14/11/2019        65
  B      14/11/2019    9        15/11/2019        75
  B      14/11/2019    9        16/11/2019        89
  B      15/11/2019    8        13/11/2019        18
  B      15/11/2019    8        14/11/2019        65
  B      15/11/2019    8        15/11/2019        75
  B      15/11/2019    8        16/11/2019        89

I want to keep the rows where both dates are same. I was doing this:

df.drop_duplicates(subset=['Date_1', 'Date_2'])

But it do not work. Can`t figure out how to drop those extra rows?


Solution

  • Use boolean indexing with compare both columns:

    df1 = df[df['Date_1'] == df['Date_2'])
    

    Or DataFrame.query:

    df1 = df.query("Date_1 == Date_2")