My input has a field 'condition' with only two values. Lets assume only values 'A' or 'B'.
When at least a single event with condition=A within a tumbling window is found, only events with condition=A should be outputted. However, when no events for A are found, only events with B should be outputted in the same window. Given the following input with a tumbling window of 4 ticks:
Condition Time
----------- ------
A T1
B T2
A T3
B T5
B T6
B T7
B T8
B T10
A T11
A T12
A T13
A T14
A T15
The output should be as follows:
Condition Time (Window)
----------- ------ ----------
A T1 T1-3
A T3 T1-3
B T5 T5-8
B T6 T5-8
B T7 T5-8
B T8 T5-8
A T11 T9-12
A T12 T9-12
A T13 T13-16
A T14 T13-16
A T15 T13-16
How can I setup my steps so the following output is achieved from my input? I tried several option with using group but was unsuccessful
This is an interesting problem. First allow me to correct your definition of a window. Windows of 4 ticks for the time range from 0 to 16 are:
( 0 - 4]
( 4 - 8]
( 8 - 12]
(12 - 16]
, where start time is excluded and end time is included. The end time is the timestamp of the result of the computation over that window.
Now here is the query that computes your answer.
WITH
count_as as (
SELECT
cnt = SUM(case cond when 'A' then 1 else 0 end)
FROM input TIMESTAMP BY time
GROUP BY tumblingwindow(second, 4)
)
SELECT
input.cond, input.time
FROM
count_as a
JOIN
input TIMESTAMP BY time
ON DATEDIFF(second, input, a) >= 0 AND DATEDIFF(second, input, a) < 4
WHERE
(a.cnt > 0 AND input.cond = 'A')
OR
(a.cnt = 0 AND input.cond = 'B')
The count_as
step computes number of A's in a window. This will produce an event at each end of the window (4, 8, 12, and 16 seconds in this example) with the count of A's seen in the last 4 sec.
Then we just join it back with input
, but only last 4 seconds of it.
And we need to be careful defining time bounds (aka wiggle room) to correctly align with window boundaries. Thus using >=0 and <4
instead of, say, between
.