I try to get my head around the hopping window in azure stream analytics. I'll get the following data from an Azure Event Hub:
[
{
"Id": "1",
"SensorData": [
{
"Timestamp": 1603112431,
"Type": "LineCrossing",
"Direction": "forward"
},
{
"Timestamp": 1603112431,
"Type": "LineCrossing",
"Direction": "forward"
}
],
"EventProcessedUtcTime": "2020-10-20T06:35:48.5890814Z",
"PartitionId": 1,
"EventEnqueuedUtcTime": "2020-10-20T06:35:48.3540000Z"
},
{
"Id": "1",
"SensorData": [
{
"Timestamp": 1603112430,
"Type": "LineCrossing",
"Direction": "backward"
}
],
"EventProcessedUtcTime": "2020-10-20T06:35:48.5890814Z",
"PartitionId": 0,
"EventEnqueuedUtcTime": "2020-10-20T06:35:48.2140000Z"
}
]
My query looks like the following:
SELECT s.Id, COUNT(data.ArrayValue.Direction) as Count
FROM [customers] s TIMESTAMP BY EventEnqueuedUtcTime
CROSS APPLY GetArrayElements(s.SensorData) AS data
WHERE data.ArrayValue.Type = 'LineCrossing'
AND data.ArrayValue.Direction = 'forward'
GROUP BY s.Id, HoppingWindow(second, 3600, 5)
I used a Hopping Window to get every 5th second all events from the last day. My expectation for the given dto would be: One row with Id1 and Count 2, but what I receive is: 720 rows (so 3600 divided by 5) with Id1 has Count 2.
Shouldn't those events not be aggregated by the HoppingWindow function?
I structured your query as it follows:
with inputValues as (Select input.*, message.ArrayValue as Data from input CROSS APPLY GetArrayElements(input.SensorData) as message)
select inputValues.Id, count(Data.Direction) as Count
into output
from inputValues
where Data.Type = 'LineCrossing' and Data.Direction='forward'
GROUP BY inputValues.Id, HoppingWindow(second, 3600, 5)
I have set the input to Event Hub, and in the Visual Studio I have started a query with the cloud input.
I used a Windows Client application to pipe in the messages to Event Hub(2. from the picture below) and observed that events were coming every 5 seconds(1. from the picture below and 3. from the picture below).
Maybe just change the query I shared to reflect the correct time-stamping, but the result should be as expected - every 5 seconds count to the output per the defined condition for all events that arrived in the last hour(3600 seconds in the HoppingWindow function).