Search code examples
azureazure-stream-analytics

Hopping Window in Azure Stream Analytics


I try to get my head around the hopping window in azure stream analytics. I'll get the following data from an Azure Event Hub:

[
  {
    "Id": "1",
    "SensorData": [
      {
        "Timestamp": 1603112431,
        "Type": "LineCrossing",
        "Direction": "forward"
      },
      {
        "Timestamp": 1603112431,
        "Type": "LineCrossing",
        "Direction": "forward"
      }
    ],
    "EventProcessedUtcTime": "2020-10-20T06:35:48.5890814Z",
    "PartitionId": 1,
    "EventEnqueuedUtcTime": "2020-10-20T06:35:48.3540000Z"
  },
  {
    "Id": "1",
    "SensorData": [
      {
        "Timestamp": 1603112430,
        "Type": "LineCrossing",
        "Direction": "backward"
      }
    ],
    "EventProcessedUtcTime": "2020-10-20T06:35:48.5890814Z",
    "PartitionId": 0,
    "EventEnqueuedUtcTime": "2020-10-20T06:35:48.2140000Z"
  }
]

My query looks like the following:

SELECT s.Id, COUNT(data.ArrayValue.Direction) as Count
FROM [customers] s TIMESTAMP BY EventEnqueuedUtcTime
CROSS APPLY GetArrayElements(s.SensorData) AS data
WHERE data.ArrayValue.Type = 'LineCrossing' 
AND data.ArrayValue.Direction = 'forward'
GROUP BY s.Id, HoppingWindow(second, 3600, 5)

I used a Hopping Window to get every 5th second all events from the last day. My expectation for the given dto would be: One row with Id1 and Count 2, but what I receive is: 720 rows (so 3600 divided by 5) with Id1 has Count 2.

Shouldn't those events not be aggregated by the HoppingWindow function?


Solution

  • I structured your query as it follows:

    with inputValues as (Select input.*, message.ArrayValue as Data from input CROSS APPLY GetArrayElements(input.SensorData) as message)
    
    select inputValues.Id, count(Data.Direction) as Count
    into output
    from inputValues 
    where Data.Type = 'LineCrossing' and Data.Direction='forward'
    GROUP BY inputValues.Id, HoppingWindow(second, 3600, 5)
    

    I have set the input to Event Hub, and in the Visual Studio I have started a query with the cloud input.

    I used a Windows Client application to pipe in the messages to Event Hub(2. from the picture below) and observed that events were coming every 5 seconds(1. from the picture below and 3. from the picture below).

    Maybe just change the query I shared to reflect the correct time-stamping, but the result should be as expected - every 5 seconds count to the output per the defined condition for all events that arrived in the last hour(3600 seconds in the HoppingWindow function).

    enter image description here