Search code examples
azure-stream-analytics

In stream analytics query how to order the data based on column before sending it to a UDA function


So I have an azure stream analytics query. It is joining 2 inputs and that needs to be the input of a user defined aggregate function. In the UDA function, I need to do some string concatenation of the incoming rows and output one final string. So the order or the rows from the join is important to be in order of one of the columns which is a string. SA jobs do not allow me to order, so how can I accomplish this. Below is the code extract of the query..

CalcData AS
(
SELECT 
  x.fqn AS fqn,       
  x.value as xvalue,
  y.value as yvalue,
  x.time as time
FROM (select fqn
,value, time from DataInput1 ) y
join
(SELECT
fqn as fqn,
 value as value,
  time as time
FROM DataInput2 ) x on y.time=x.time and x.fqn=y.fqn and DATEDIFF(second, x, y) = 0
--order by time asc, fqn 
),
FormatData AS
(
SELECT UDA.svgstring(CalcData) AS v,
time    FROM CalcData
GROUP BY time,TumblingWindow(minute, 1)
)

I need to order by column fqn, but it gives the error

Syntax errors: The ORDER BY clause is not valid in views, inline functions, derived tables, sub-queries, and common table expressions, unless TOP or FOR XML or OFFSET is also specified.

Any help will be greatly appreciated.

Edit: also I tried

I tried

CalcData AS
 (
SELECT TOP 10000
x.fqn AS fqn,       
x.value as xvalue,
y.value as yvalue,
x.time as time
FROM (select fqn
,value, time from DataInput1 ) y
join
(SELECT
fqn as fqn,
value as value,
time as time
FROM DataInput2 ) x on y.time=x.time and x.fqn=y.fqn and DATEDIFF (second,   x, y) = 0
order by time, fqn 
),
FormatData AS
(
SELECT UDA.svgstring(CalcData) AS v,
time    FROM CalcData
 GROUP BY time,TumblingWindow(minute, 1)
)

but it doesn't look like the order by is taking effect


Solution

  • Stream Analytics does not have an order by clause at this point. In your original query order by was before the group by window, since ASA is processing an unbounded stream, to be able to order by a non-timestamp column, you will have to group the events into windows first. And within that window, you can order all the events.

    I see that you have time, fqn as order by columns. Assuming time is the timestamp of payload, you can achieve ordering by application time using select ... from input timestamp by time.

    And then you can do one more level of ordering inside UDA.svgstring. Inputs to the function will not be ordered by fqn, it will be ordered by time. You will have to accumulate all the events and when computeResult() is called, order by fqn and create the string on ordered rows.