Search code examples
azureazure-iot-hubazure-stream-analytics

Azure Stream Analytics : remove duplicates while aggregating


I'm working on a system of temperature and pressure sensors, where my data is flowing through a Stream analytics job. Now there maybe duplicate messages sent in because of acknowledgements not being received and various other reasons. So my data could be of the format:-

DeviceID    TimeStamp    MeasurementName     Value

1           1            temperature         50
1           1            temperature         50
1           2            temperature         60

Note that the 2nd record is a duplicate of the 1st one as DeviceId and Timestamp and MeasurementName are same. I wish to take an average over 5 min tumbling window for this data in the stream analytics job. So I have this query

SELECT
    AVG(Value)
FROM
    SensorData
GROUP BY
    DeviceId,
    MeasurementName,
    TumblingWindow(minute, 5)

This query is expected to give me average measurement of temperature and pressure values for each device in 5 min. In doing this average I need to eliminate duplicates. The actual average is (50+60)/2 = 55. But the average given my this query will be (50+50+60)/3 = 53.33

How do I tweak this query for the right output?

Thanks in advance.


Solution

  • According to the Query Language Elements in ASA,it seems that distinct is not supported by ASA directly. However, you could find it could be used with COUNT from here.

    So,may be you could refer to my below sql to get avg of Value without duplicate data.

    with temp as
    (
    select count(distinct DeviceID) AS device,
    count(distinct TimeStamp) AS time,
    count(distinct MeasurementName) AS name,
    Value as v
    from jsoninput
    group by Value,TumblingWindow(minute, 5)
    )
    select avg(v) from temp
    group by TumblingWindow(minute, 5)
    

    Output with your sample data:

    enter image description here