Search code examples
azureazure-stream-analytics

Only output events for a condition when at least a single event matching the condition has been found, else output the input


My input has a field 'condition' with only two values. Lets assume only values 'A' or 'B'.

When at least a single event with condition=A within a tumbling window is found, only events with condition=A should be outputted. However, when no events for A are found, only events with B should be outputted in the same window. Given the following input with a tumbling window of 4 ticks:

  Condition   Time  
 ----------- ------ 
  A           T1    
  B           T2    
  A           T3    
  B           T5    
  B           T6    
  B           T7    
  B           T8    
  B           T10   
  A           T11   
  A           T12   
  A           T13   
  A           T14   
  A           T15   

The output should be as follows:

  Condition   Time   (Window)  
 ----------- ------ ---------- 
  A           T1     T1-3      
  A           T3     T1-3      
  B           T5     T5-8      
  B           T6     T5-8      
  B           T7     T5-8      
  B           T8     T5-8      
  A           T11    T9-12     
  A           T12    T9-12     
  A           T13    T13-16    
  A           T14    T13-16    
  A           T15    T13-16    

How can I setup my steps so the following output is achieved from my input? I tried several option with using group but was unsuccessful


Solution

  • This is an interesting problem. First allow me to correct your definition of a window. Windows of 4 ticks for the time range from 0 to 16 are:

    ( 0 -  4]
    ( 4 -  8]
    ( 8 - 12]
    (12 - 16]
    

    , where start time is excluded and end time is included. The end time is the timestamp of the result of the computation over that window.

    Now here is the query that computes your answer.

    WITH
    count_as as (
        SELECT
            cnt = SUM(case cond when 'A' then 1 else 0 end)
        FROM input TIMESTAMP BY time
        GROUP BY tumblingwindow(second, 4)
    )
    
    SELECT
        input.cond, input.time
    FROM
        count_as a
          JOIN
        input TIMESTAMP BY time
          ON DATEDIFF(second, input, a) >= 0 AND DATEDIFF(second, input, a) < 4
    WHERE
        (a.cnt > 0 AND input.cond = 'A')
        OR
        (a.cnt = 0 AND input.cond = 'B')
    

    The count_as step computes number of A's in a window. This will produce an event at each end of the window (4, 8, 12, and 16 seconds in this example) with the count of A's seen in the last 4 sec.

    Then we just join it back with input, but only last 4 seconds of it. And we need to be careful defining time bounds (aka wiggle room) to correctly align with window boundaries. Thus using >=0 and <4 instead of, say, between.