Search code examples
sql

Delete all similar rows except first one


I have a database that contains several rows with the same values but a different timestamp. This was caused by an error and now I want to delete the incorrect values. There are too many values to delete manually. In one column, a number is counted up that is reset daily, so there may also be similar rows with different days, which is why I only want to delete similar rows that were written to the database one hour after the first entry. The entries must all contain the same value in the Mode and Value column.

This is what my datatable looks like:

id timestamp mode value
1 2024-08-20 08:45:30 finished/X/2 55
2 2024-08-20 08:46:20 finished/X/2 55
3 2024-08-20 08:47:50 finished/X/2 55
4 2024-08-20 14:45:30 finished/Z/5 67
5 2024-08-20 14:47:15 finished/Z/5 67
6 2024-08-20 14:50:25 finished/Z/5 67
7 2024-08-21 10:20:45 finished/X/2 32
8 2024-08-21 10:21:30 finished/X/2 32
9 2024-08-21 12:20:55 finished/X/2 55

In this case, I would like to use the query to delete the rows in which the ID is 2, 3, 5, 6 & 8

I have already tried to filter the data with a join but got no result.


Solution

  • Here is an updated SQL-Server based solution:

    SELECT id,timestamp,mode,value FROM (
     SELECT *,row_number() OVER (partition by value,mode,cast(timestamp as date) order by timestamp) rn 
     FROM #tmp
     ) tbl WHERE rn=1 ORDER BY id
    

    (See demo here: https://data.stackexchange.com/stackoverflow/query/1858218/filter-out-the-first-values-of-repeated-entries )

    Admittedly, this solution will not directly "delete" the duplicate rows from your table but instead it will create a new table containing the desired rows. You could insert the resulting rows into a new table, rename the tables accordingly and eventually delete the old one. This would give you time and opportunity to check the newly created table before releasing it to the public.