I'm working on a system of temperature and pressure sensors, where my data is flowing through a Stream analytics job. Now there maybe duplicate messages sent in because of acknowledgements not being received and various other reasons. So my data could be of the format:-
DeviceID TimeStamp MeasurementName Value
1 1 temperature 50
1 1 temperature 50
1 2 temperature 60
Note that the 2nd record is a duplicate of the 1st one as DeviceId and Timestamp and MeasurementName are same. I wish to take an average over 5 min tumbling window for this data in the stream analytics job. So I have this query
SELECT
AVG(Value)
FROM
SensorData
GROUP BY
DeviceId,
MeasurementName,
TumblingWindow(minute, 5)
This query is expected to give me average measurement of temperature and pressure values for each device in 5 min. In doing this average I need to eliminate duplicates. The actual average is (50+60)/2 = 55. But the average given my this query will be (50+50+60)/3 = 53.33
How do I tweak this query for the right output?
Thanks in advance.
According to the Query Language Elements in ASA,it seems that distinct is not supported by ASA directly. However, you could find it could be used with COUNT
from here.
So,may be you could refer to my below sql to get avg of Value without duplicate data.
with temp as
(
select count(distinct DeviceID) AS device,
count(distinct TimeStamp) AS time,
count(distinct MeasurementName) AS name,
Value as v
from jsoninput
group by Value,TumblingWindow(minute, 5)
)
select avg(v) from temp
group by TumblingWindow(minute, 5)
Output with your sample data: