I have large database full of temperature values. Problem is that system saves lot of duplicate values, and now the SQL Server database is full (Express edition).
Part of sample of data, using query
SELECT [PointID], [Time], [Data]
FROM TrendSamples
WHERE pointid = 13
ORDER BY time DESC
Result:
PointID Time Data
---------------------------------------------
13 2020-01-02 07:29:01.077 22,2999992370605
13 2020-01-02 07:28:50.937 22,5
13 2020-01-02 07:28:05.230 22,2999992370605
13 2020-01-02 07:27:55.090 22,3999996185303
13 2020-01-02 07:27:04.510 22,3999996185303
13 2020-01-02 07:26:13.443 22,3999996185303
13 2020-01-02 07:25:22.580 22,3999996185303
13 2020-01-02 07:24:31.340 22,3999996185303
13 2020-01-02 07:23:40.370 22,3999996185303
13 2020-01-02 07:22:49.460 22,3999996185303
13 2020-01-02 07:21:59.160 22,3999996185303
13 2020-01-02 07:21:08.483 22,3999996185303
13 2020-01-02 07:20:17.713 22,3999996185303
13 2020-01-02 07:19:26.710 22,3999996185303
13 2020-01-02 07:18:35.283 22,3999996185303
13 2020-01-02 07:17:44.250 22,3999996185303
13 2020-01-02 07:16:53.463 22,3999996185303
13 2020-01-02 07:16:02.367 22,3999996185303
13 2020-01-02 07:15:11.083 22,3999996185303
13 2020-01-02 07:14:19.987 22,3999996185303
13 2020-01-02 07:13:29.230 22,3999996185303
13 2020-01-02 07:12:38.197 22,3999996185303
13 2020-01-02 07:11:47.957 22,3999996185303
13 2020-01-02 07:10:57.033 22,3999996185303
13 2020-01-02 07:10:06.293 22,3999996185303
13 2020-01-02 07:09:15.183 22,3999996185303
13 2020-01-02 07:08:24.083 22,3999996185303
13 2020-01-02 07:07:33.237 22,3999996185303
13 2020-01-02 07:06:42.140 22,3999996185303
13 2020-01-02 07:05:51.557 22,3999996185303
13 2020-01-02 07:05:00.787 22,3999996185303
13 2020-01-02 07:04:09.707 22,3999996185303
13 2020-01-02 07:03:18.970 22,3999996185303
13 2020-01-02 07:02:28.043 22,3999996185303
13 2020-01-02 07:01:36.930 22,3999996185303
13 2020-01-02 07:00:46.317 22,3999996185303
13 2020-01-02 06:59:55.390 22,3999996185303
13 2020-01-02 06:59:04.403 22,3999996185303
13 2020-01-02 06:58:13.103 22,3999996185303
13 2020-01-02 06:58:01.247 22,5
Like you see, there is lot of duplicate data, where data has not changed at all between times. Is there any way to delete all duplicate data, but LEAVE FIRST AND LAST row before and after value change.
Result what I want is this
PointID Time Data
---------------------------------------------
13 2020-01-02 07:29:01.077 22,2999992370605
13 2020-01-02 07:28:50.937 22,5
13 2020-01-02 07:28:05.230 22,2999992370605
13 2020-01-02 07:27:55.090 22,3999996185303
13 2020-01-02 06:58:13.103 22,3999996185303
13 2020-01-02 06:58:01.247 22,5
You could use an updatable CTE for this:
with cte as (
select
t.*,
lag(data) over(partition by pointID order by time) lag_data,
lead(data) over(partition by pointID order by time) lead_data
from mytable t
)
delete from cte where (data = lag_data and data = lead_data)
The CTE uses lag()
and lead()
to bring in the value of data
on the previous and next rows that have the same pointID
, ordered by time
. Then the outer query deletes records where data
is the same as on the previous and next records.