Search code examples
sqlsql-serverrefactoring-databases

Reduce/Summarize and Replace Timestamped Records


I have a SQL table that has timestamped records for server performance data. This data is polled and stored every 1 minute for multiple servers. I want to keep data for a large period of time but reduce the number records for data older than six months.

For example, I have some old records like so:

    Timestamp  Server  CPU  App1  App2
1   ... 00:01  Host1   5    1     10    
2   ... 00:01  Host2   10   5     20
3   ... 00:02  Host1   6    0     11
4   ... 00:02  Host2   11   5     20
5   ... 00:03  Host1   4    1     9
6   ... 00:04  Host2   9    6     19

I want to be able to reduce this data from every minute to every 10 minutes or possibly every hour for older data.

My initial assumption is that I'd average the values for times within a 10 minute time period and create a new timestamped record after deleting the old records. Could I create a sql query that generates the insert statements for the new summarized records? What would that query look like?

Or is there a better way to accomplish this summarization job?


Solution

  • You might also want to consider moving the summarized information into a different table so you don't end up in a situation where you're wondering if you're looking at "raw" or summarized data. Other benefits would be that you could include MAX, MIN, STDDEV and other values along with the AVG.

    The tricky part is chunking out the times. The best way I could think of was to start with the output from the CONVERT(blah, Timestamp, 120) function:

    -- Result: 2015-07-08 20:50:55
    SELECT CONVERT(VARCHAR(19), CURRENT_TIMESTAMP, 120)
    

    By cutting it off after the hour or after the 10-minute point you can truncate the times:

    -- Hour; result is 2015-07-08 20
    SELECT CONVERT(VARCHAR(13), CURRENT_TIMESTAMP, 120)
    
    -- 10-minute point; result is 2015-07-08 20:50:5
    SELECT CONVERT(VARCHAR(15), CURRENT_TIMESTAMP, 120)
    

    With a little more massaging you can fill out the minutes for either one and CAST it back to a DATETIME or DATETIME2:

    -- Hour increment
    CAST(CONVERT(VARCHAR(13), CURRENT_TIMESTAMP, 120) + ':00' AS DATETIME)
    
    -- 10-minute increment
    CAST(CONVERT(VARCHAR(15), CURRENT_TIMESTAMP, 120) + 0' AS DATETIME)
    

    Using the logic above, all times are truncated. In other words, the hour formula will convert Timestamp where 11:00 <= Timestamp < 12:00 to 11:00. The minute formula will convert Timestamp where 11:20 <= Timestamp < 11:30 to 11:20.

    So the better part query looks like this (I've left out getting rid of the rows you've just summarized):

    -- The hour-increment version
    INSERT INTO myTableOrOtherTable
    SELECT
      CAST(CONVERT(VARCHAR(13), [Timestamp], 120) + ':00' AS DATETIME),
      AVG(CPU),
      AVG(App1),
      AVG(App2)
    FROM myTable
    GROUP BY
      CAST(CONVERT(VARCHAR(13), [Timestamp], 120) + ':00' AS DATETIME)