I loaded some data into a SQL Server table from a .CSV
file for test purposes, I don't have any primary key, unique key or auto-generated ID in that table.
Helow is an example of the situation:
select *
from people
where name in (select name
from people
group by name
having count(name) > 1)
When I run this query, I get these results:
The goal is to keep one row and remove other duplicate rows.
Is there any way other than save the content somewhere else, delete all duplicate rows and insert a new one?
Thanks for helping!
You could use an updatable CTE for this.
If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):
with cte as (
select row_number() over(partition by name, age, gender order by (select null)) rn
from people
)
delete from cte where rn > 1
If you want to delete duplicates on name
only (as shown in your existing query):
with cte as (
select row_number() over(partition by name order by (select null)) rn
from people
)
delete from cte where rn > 1