Im using SQL Server 2016, and have a very large table containing millions of rows of data from different sources at irregular intervals over several years. The table cannot be altered, typical data looks like this -
Reading_ID Source Date Reading
========== ====== ==== =======
1 1 2023/01/01 00:04:00 7
2 1 2023/01/01 00:10:00 3
3 2 2023/01/01 00:15:00 8
4 1 2023/01/01 01:00:00 2
5 2 2023/01/01 01:03:00 15
The table has CONSTRAINT [PK_DATA_READINGS] PRIMARY KEY CLUSTERED ([Source] ASC, [Date] ASC). The SOURCE can be any number, its not fixed or known in advance. New sources can start at any time.
What I want to do is specify a date range and an interval in hours, then just get 1 reading from each source every X hours. i.e. in the above row 2 wouldn't be returned as its too close to row 1
I've tried something like the following -
DECLARE @Start_Date DATETIME = '2023/01/01 00:00:00',
@End_Date DATETIME = '2023/02/01 00:00:00',
@Interval_Hours = 4
;WITH HOURLY_DATA AS (
SELECT d.Source,
d.Date,
d.Reading,
ROW_NUMBER() OVER (PARTITION BY d.Source, DATEDIFF(HOUR, @Start_Date, d.DATE) / @Interval_Hours ORDER BY d.SOURCE, d.DATE) AS SOURCE_HOUR_ROW
FROM data_readings d
WHERE d.DATE BETWEEN @Start_Date AND @End_Date
)
SELECT h.Source,
h.Date,
h.Reading
FROM HOURLY_DATA h
WHERE h.SOURCE_HOUR_ROW = 1
But its still very slow to execute, sometimes taking 5 minutes or more to complete. I would like a faster way to get this data. I've looked at the Explain Plan, but cant see an obvious solution.
Thank you for looking.
The slowness is caused by the volume of data in the CTE. I found this solution which works faster How to sample records by time