Hey folks: I have the following table in a vertica DB:
+-----+------+----------+
| Tid | item | time_sec |
+-----+------+----------+
| 1 | A | 1 |
| 1 | B | 2 |
| 1 | C | 4 |
| 1 | D | 5 |
| 1 | E | 6 |
| 2 | A | 5 |
| 2 | E | 5 |
+-----+------+----------+
My goal is to create new item groups that lie within a time window deltaT. Meaning that the difference between the first and last item's timestamp is smaller or equal to deltaT. Example: if deltaT = 2 sec we would get the new table:
+-----+------+
| Tid | item |
+-----+------+
| 11 | A |
| 11 | B |
| 12 | B |
| 12 | C |
| 13 | C |
| 13 | D |
| 13 | E |
| 14 | D |
| 14 | E |
| 15 | E |
| 21 | A |
| 21 | E |
+-----+------+
Here is the walk through of the table: First we inspect all items with the Tid 1, and create sub groups with Tid 1n, where n is a counter. Our first sub group with the Tid 11 consists of item A, B since deltaT between the last and first item is =<2. The next group has Tid 12 with item B,C. The group after that one has the Tid 13 and items C,D,E since all items are within a time span of 2 seconds. This goes on until the last item with Tid 1. Than we start over with the group that has Tid 2.
The new Tid numbering for the sub groups can be continous (1...6), I just choose this kind of numbering to show the relation to the original table.
I am looking at the vertica functions LAG and Time_slice but cannot figure out a way how to handle such a problem elegantly.
This is how far I got - and it does not answer your question, really. But it could constitute a few pointers:
WITH
-- your input
input(Tid,item,time_sec) AS (
SELECT 1,'A',1
UNION ALL SELECT 1,'B',2
UNION ALL SELECT 1,'C',4
UNION ALL SELECT 1,'D',5
UNION ALL SELECT 1,'E',6
UNION ALL SELECT 2,'A',5
UNION ALL SELECT 2,'E',5
)
-- end of your input, start your "real" WITH clause here
,
input_w_ts AS (
SELECT
*
, TIMESTAMPADD('SECOND',time_sec-1,TIMESTAMP '2000-01-01 00:00:00') AS ts
FROM input
)
SELECT
TS_LAST_VALUE(Tid) AS Tid
, item
, TS_LAST_VALUE(time_sec) AS time_sec
, tsr
FROM input_w_ts
TIMESERIES tsr AS '2 SECONDS' OVER (PARTITION BY item ORDER BY ts)
ORDER BY 1,4
;
Output:
Tid|item|time_sec|tsr
1|A | 1|2000-01-01 00:00:00
1|B | 2|2000-01-01 00:00:00
1|A | 1|2000-01-01 00:00:02
1|C | 4|2000-01-01 00:00:02
1|D | 5|2000-01-01 00:00:04
1|E | 6|2000-01-01 00:00:04
2|A | 5|2000-01-01 00:00:04