I am using the row_number() functon to identify duplicates, partitioned by ID and Name:
SELECT *
, ROW_NUMBER() OVER(PARTITION BY "ID", "Name" ORDER BY "date" ASC) as row_number
from table;
My data looks like:
ID | Name | date | row_number |
---|---|---|---|
1 | Name1 | 2024-11-19 | 2 |
1 | Name1 | 2024-11-18 | 1 |
2 | Name1 | 2024-11-18 | 1 |
3 | Name2 | 2024-11-19 | 1 |
4 | Name3 | 2024-11-12 | 3 |
4 | Name3 | 2024-11-11 | 2 |
4 | Name3 | 2024-11-10 | 1 |
10 | Name7 | 2024-11-10 | 1 |
Now I want to keep only the rows where we have multiple times the same ID and Name combination. So the third, fourth and last row should be removed and the output should be as below. How can this be done?
ID | Name | date | row_number |
---|---|---|---|
1 | Name1 | 2024-11-19 | 2 |
1 | Name1 | 2024-11-18 | 1 |
4 | Name3 | 2024-11-12 | 3 |
4 | Name3 | 2024-11-11 | 2 |
4 | Name3 | 2024-11-10 | 1 |
I would use COUNT()
here as a window function, with the same partition as you were using with ROW_NUMBER()
:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY ID, Name) cnt,
ROW_NUMBER() OVER (PARTITION BY ID, Name ORDER BY "date") rn
FROM yourTable
)
SELECT ID, Name, "date", rn
FROM cte
WHERE cnt > 1
ORDER BY ID, "date" DESC;