Search code examples
postgresql

Keep only duplicates including original value


I am using the row_number() functon to identify duplicates, partitioned by ID and Name:

SELECT *
, ROW_NUMBER() OVER(PARTITION BY "ID", "Name" ORDER BY "date" ASC) as row_number
from table; 

My data looks like:

ID Name date row_number
1 Name1 2024-11-19 2
1 Name1 2024-11-18 1
2 Name1 2024-11-18 1
3 Name2 2024-11-19 1
4 Name3 2024-11-12 3
4 Name3 2024-11-11 2
4 Name3 2024-11-10 1
10 Name7 2024-11-10 1

Now I want to keep only the rows where we have multiple times the same ID and Name combination. So the third, fourth and last row should be removed and the output should be as below. How can this be done?

ID Name date row_number
1 Name1 2024-11-19 2
1 Name1 2024-11-18 1
4 Name3 2024-11-12 3
4 Name3 2024-11-11 2
4 Name3 2024-11-10 1

Solution

  • I would use COUNT() here as a window function, with the same partition as you were using with ROW_NUMBER():

    WITH cte AS (
        SELECT *, COUNT(*) OVER (PARTITION BY ID, Name) cnt,
                  ROW_NUMBER() OVER (PARTITION BY ID, Name ORDER BY "date") rn
        FROM yourTable
    )
    
    SELECT ID, Name, "date", rn
    FROM cte
    WHERE cnt > 1
    ORDER BY ID, "date" DESC;