Search code examples
sqlduplicatessubquery

SQL - Remove Duplicate Rows Based on a Single Column


I have a dataset with duplicate IDs, different scores, and different days like this

enter image description here

I want one row per ID based on the maximum value of day. This is the output I want

enter image description here

The query I currently have is not deduplicating because there are multiple scores per ID. I am trying to use a subquery in the where clause like this

select distinct
    ID,
    score, 
    day
from 
    table A
where 
    day in (select max(day)
            from table A
            group by ID)

The subquery obviously returns only one ID per row, but when I add in the score, I am still getting duplicates.

Any ideas are greatly appreciated!


Solution

  • The following uses a common table expression, which is supported by most DBMS, for getting the max day per ID. Then join to it on both ID and Day.

    with maxCTE as (
     select id, max(day) as max_day
       from table1
      group by id  
      )
    select t.*
      from table1 t 
      join maxCTE m 
        on t.id = m.id 
       and t.day = m.max_day