Search code examples
sqlsql-servercommon-table-expressionsql-deletedate-arithmetic

SQL Server database delete duplicates trend values, but leave first and last


I have large database full of temperature values. Problem is that system saves lot of duplicate values, and now the SQL Server database is full (Express edition).

Part of sample of data, using query

SELECT [PointID], [Time], [Data] 
FROM TrendSamples 
WHERE pointid = 13 
ORDER BY time DESC

Result:

PointID Time       Data
---------------------------------------------
13  2020-01-02 07:29:01.077 22,2999992370605
13  2020-01-02 07:28:50.937 22,5
13  2020-01-02 07:28:05.230 22,2999992370605
13  2020-01-02 07:27:55.090 22,3999996185303 
13  2020-01-02 07:27:04.510 22,3999996185303
13  2020-01-02 07:26:13.443 22,3999996185303
13  2020-01-02 07:25:22.580 22,3999996185303
13  2020-01-02 07:24:31.340 22,3999996185303
13  2020-01-02 07:23:40.370 22,3999996185303
13  2020-01-02 07:22:49.460 22,3999996185303
13  2020-01-02 07:21:59.160 22,3999996185303
13  2020-01-02 07:21:08.483 22,3999996185303
13  2020-01-02 07:20:17.713 22,3999996185303
13  2020-01-02 07:19:26.710 22,3999996185303
13  2020-01-02 07:18:35.283 22,3999996185303
13  2020-01-02 07:17:44.250 22,3999996185303
13  2020-01-02 07:16:53.463 22,3999996185303
13  2020-01-02 07:16:02.367 22,3999996185303
13  2020-01-02 07:15:11.083 22,3999996185303
13  2020-01-02 07:14:19.987 22,3999996185303
13  2020-01-02 07:13:29.230 22,3999996185303
13  2020-01-02 07:12:38.197 22,3999996185303
13  2020-01-02 07:11:47.957 22,3999996185303
13  2020-01-02 07:10:57.033 22,3999996185303
13  2020-01-02 07:10:06.293 22,3999996185303
13  2020-01-02 07:09:15.183 22,3999996185303
13  2020-01-02 07:08:24.083 22,3999996185303
13  2020-01-02 07:07:33.237 22,3999996185303
13  2020-01-02 07:06:42.140 22,3999996185303
13  2020-01-02 07:05:51.557 22,3999996185303
13  2020-01-02 07:05:00.787 22,3999996185303
13  2020-01-02 07:04:09.707 22,3999996185303
13  2020-01-02 07:03:18.970 22,3999996185303
13  2020-01-02 07:02:28.043 22,3999996185303
13  2020-01-02 07:01:36.930 22,3999996185303
13  2020-01-02 07:00:46.317 22,3999996185303
13  2020-01-02 06:59:55.390 22,3999996185303
13  2020-01-02 06:59:04.403 22,3999996185303
13  2020-01-02 06:58:13.103 22,3999996185303
13  2020-01-02 06:58:01.247 22,5

Like you see, there is lot of duplicate data, where data has not changed at all between times. Is there any way to delete all duplicate data, but LEAVE FIRST AND LAST row before and after value change.

Result what I want is this

PointID Time    Data
---------------------------------------------
13  2020-01-02 07:29:01.077 22,2999992370605
13  2020-01-02 07:28:50.937 22,5
13  2020-01-02 07:28:05.230 22,2999992370605
13  2020-01-02 07:27:55.090 22,3999996185303 
13  2020-01-02 06:58:13.103 22,3999996185303
13  2020-01-02 06:58:01.247 22,5

Solution

  • You could use an updatable CTE for this:

    with cte as (
        select 
            t.*, 
            lag(data) over(partition by pointID order by time) lag_data,
            lead(data) over(partition by pointID order by time) lead_data
        from mytable t
    )
    delete from cte where (data = lag_data and data = lead_data)
    

    The CTE uses lag() and lead() to bring in the value of data on the previous and next rows that have the same pointID, ordered by time. Then the outer query deletes records where data is the same as on the previous and next records.