Search code examples
sqlvertica

SQL to get data on top of the hour from a time series database


I have a third party app that writes to Vertica database every 5 minutes. As a result, a sample table looks like this:

CREATE TABLE sample (
    item_id int,
    metric_val float,
    ts timestamp
);

-- Hypothetical sample values in 2nd column; these can be any values
INSERT INTO sample VALUES(1, 11.0, '2022-03-29 00:00:00')
INSERT INTO sample VALUES(1, 11.1, '2022-03-29 00:05:00')
INSERT INTO sample VALUES(1, 11.2, '2022-03-29 00:10:00')
INSERT INTO sample VALUES(1, 11.3, '2022-03-29 00:15:00')
INSERT INTO sample VALUES(1, 11.4, '2022-03-29 00:20:00')
INSERT INTO sample VALUES(1, 11.5, '2022-03-29 00:25:00')
INSERT INTO sample VALUES(1, 11.6, '2022-03-29 00:30:00')
...
...
INSERT INTO sample VALUES(1, 12.1, '2022-03-29 01:00:00')
INSERT INTO sample VALUES(1, 12.2, '2022-03-29 01:05:00')
...
INSERT INTO sample VALUES(1, 13.1, '2022-03-29 02:00:00')
INSERT INTO sample VALUES(1, 13.2, '2022-03-29 02:05:00')

As a result, there are 288 (24 hours * 12 entries each hour) rows for each day for a given item. I want to retrieve the records at the top of each hour i.e.

1, 11.0, 2022-03-29 00:00:00
1, 12.0, 2022-03-29 01:00:00
1, 13.0, 2022-03-29 02:00:00
...
1, 101.0, 2022-03-30 00:00:00
1, 102.0, 2022-03-30 01:00:00

I tried the below query but the challenge is to increment the value of 'n'

WITH a AS (
    SELECT item_id, metric_val, ts, ROW_NUMBER() OVER (PARTITION BY ts, HOUR(ts) ORDER BY ts) AS n
    FROM sample WHERE item_id = 1
)
SELECT * FROM a WHERE n = 1

The Vertica TIME_SLICE function seems promising but I couldn't make that work even after multiple attempts. Can this please be advised?

SELECT version();
Vertica Analytic Database v10.1.1-0

Solution

  • Seems pretty simple - or do I miss something?

    Just filter out the rows whose ts truncated to the hour ('HH') is equal to ts ...

    WITH sample (item_id, metric_val, ts) AS (                                                                                                                  
    -- Hypothetical sample values in 2nd column; these can be any values
                SELECT 1, 11.0, TIMESTAMP '2022-03-29 00:00:00'
      UNION ALL SELECT 1, 11.1, TIMESTAMP '2022-03-29 00:05:00'
      UNION ALL SELECT 1, 11.2, TIMESTAMP '2022-03-29 00:10:00'
      UNION ALL SELECT 1, 11.3, TIMESTAMP '2022-03-29 00:15:00'
      UNION ALL SELECT 1, 11.4, TIMESTAMP '2022-03-29 00:20:00'
      UNION ALL SELECT 1, 11.5, TIMESTAMP '2022-03-29 00:25:00'
      UNION ALL SELECT 1, 11.6, TIMESTAMP '2022-03-29 00:30:00'
      UNION ALL SELECT 1, 12.1, TIMESTAMP '2022-03-29 01:00:00'
      UNION ALL SELECT 1, 12.2, TIMESTAMP '2022-03-29 01:05:00'
      UNION ALL SELECT 1, 13.1, TIMESTAMP '2022-03-29 02:00:00'
      UNION ALL SELECT 1, 13.2, TIMESTAMP '2022-03-29 02:05:00'
    )
    SELECT
      *
    FROM sample
    WHERE TRUNC(ts,'HH') = ts;
    -- out  item_id | metric_val |         ts          
    -- out ---------+------------+---------------------
    -- out        1 |       11.0 | 2022-03-29 00:00:00
    -- out        1 |       12.1 | 2022-03-29 01:00:00
    -- out        1 |       13.1 | 2022-03-29 02:00:00