I have a SQL table that has timestamped records for server performance data. This data is polled and stored every 1 minute for multiple servers. I want to keep data for a large period of time but reduce the number records for data older than six months.
For example, I have some old records like so:
Timestamp Server CPU App1 App2
1 ... 00:01 Host1 5 1 10
2 ... 00:01 Host2 10 5 20
3 ... 00:02 Host1 6 0 11
4 ... 00:02 Host2 11 5 20
5 ... 00:03 Host1 4 1 9
6 ... 00:04 Host2 9 6 19
I want to be able to reduce this data from every minute to every 10 minutes or possibly every hour for older data.
My initial assumption is that I'd average the values for times within a 10 minute time period and create a new timestamped record after deleting the old records. Could I create a sql query that generates the insert statements for the new summarized records? What would that query look like?
Or is there a better way to accomplish this summarization job?
You might also want to consider moving the summarized information into a different table so you don't end up in a situation where you're wondering if you're looking at "raw" or summarized data. Other benefits would be that you could include MAX, MIN, STDDEV and other values along with the AVG.
The tricky part is chunking out the times. The best way I could think of was to start with the output from the CONVERT(blah, Timestamp, 120)
function:
-- Result: 2015-07-08 20:50:55
SELECT CONVERT(VARCHAR(19), CURRENT_TIMESTAMP, 120)
By cutting it off after the hour or after the 10-minute point you can truncate the times:
-- Hour; result is 2015-07-08 20
SELECT CONVERT(VARCHAR(13), CURRENT_TIMESTAMP, 120)
-- 10-minute point; result is 2015-07-08 20:50:5
SELECT CONVERT(VARCHAR(15), CURRENT_TIMESTAMP, 120)
With a little more massaging you can fill out the minutes for either one and CAST
it back to a DATETIME
or DATETIME2
:
-- Hour increment
CAST(CONVERT(VARCHAR(13), CURRENT_TIMESTAMP, 120) + ':00' AS DATETIME)
-- 10-minute increment
CAST(CONVERT(VARCHAR(15), CURRENT_TIMESTAMP, 120) + 0' AS DATETIME)
Using the logic above, all times are truncated. In other words, the hour formula will convert Timestamp
where 11:00 <= Timestamp < 12:00
to 11:00. The minute formula will convert Timestamp
where 11:20 <= Timestamp < 11:30
to 11:20.
So the better part query looks like this (I've left out getting rid of the rows you've just summarized):
-- The hour-increment version
INSERT INTO myTableOrOtherTable
SELECT
CAST(CONVERT(VARCHAR(13), [Timestamp], 120) + ':00' AS DATETIME),
AVG(CPU),
AVG(App1),
AVG(App2)
FROM myTable
GROUP BY
CAST(CONVERT(VARCHAR(13), [Timestamp], 120) + ':00' AS DATETIME)