Search code examples
azure-stream-analytics

Azure Stream analytics query not returning result set when using timestamp by


I am trying to extract the parts per minute produced where v is an aggregated counter of the parts produced till that time.

My azure SQL Query is as below

select 
x.fqn,
( max(cast(y.arrayvalue.v as BIGINT))-(min(cast(y.arrayvalue.v as BIGINT)))) as ppm
 from
(SELECT
   TS.ArrayIndex,
   TS.ArrayValue.FQN,
   TS.ArrayValue.vqts
FROM
[EventHubInput] as hub    
timestamp by y.arrayvalue.t
CROSS APPLY GetArrayElements(hub.timeseries) AS TS) as x
cross apply GetArrayElements(x.vqts) AS y
where x.fqn like '%Production%' and y.arrayvalue.q=192
group by tumblingwindow(minute,1),x.fqn

My input data looks like this

{
 "timeSeries": [
{
    "fqn":"MyEnterprise.Gateways.GatewayE.CLX.Tags.StateBasic",
        "vqts":[
                {
                "v": "" ,
                "q": 192 ,
                "t":"2016-06-24T16:39:45.683+0000"
                }
    ]
},              {
    "fqn":"MyEnterprise.Gateways.GatewayE.CLX.Tags.ProductionCount",
        "vqts":[
                {
                "v": 264 ,
                "q": 192 ,
                "t":"2016-06-24T16:39:45.683+0000"
                }
    ]
},              {
    "fqn":".Gateways.GatewayE.CLX.Tags.StateDetailed",
        "vqts":[
                {
                "v": "" ,
                "q": 192 ,
                "t":"2016-06-24T16:39:45.683+0000"
                }
    ]
}           ]

My query returns no result. when I remove the timestamp by y.arrayvalue.t and add y.arrayvalue.t in the group by clause, I get some result. I realize that maybe this is because I have more than 1 timestamp field for each event, So I wanted to know if it is possible to assign the time data of the first array to timestamp by...something like timestamp by y[0].t


Solution

  • As of today, Azure Stream Analytics does not support timestamp by over a value inside an array. So the answer to your question "if it is possible to assign the time data of the first array to timestamp by" is NO.

    Here is a workaround you can use:

    First, flatten the input message in one job and output to a staging Event Hub:

    WITH flattenTS AS
    (
        SELECT
           TS.ArrayIndex,
           TS.ArrayValue.FQN,
           TS.ArrayValue.vqts
        FROM [EventHubInput]
        CROSS APPLY GetArrayElements(hub.timeseries) AS TS
    )
    , flattenVQTS AS
    (
        SELECT
            ArrayIndex
            ,FQN
            ,vqts.ArrayValue.v as v
            ,vqts.ArrayValue.q as q
            ,vqts.ArrayValue.t as t
        FROM flattenTS TS
        CROSS APPLY GetArrayElements(TS.vqts) AS vqts
    )
    
    SELECT *
    INTO [staging_eventhub]
    FROM flattenVQTS
    

    Then, use another job to read the flattened messages and do the windowed aggregation:

    SELECT
        FQN
        ,MAX(CAST(v as BIGINT))-MIN(CAST(v as BIGINT)) as ppm
    FROM [staging_eventhub] timestamp by t
    WHERE fqn LIKE '%Production%' and q=192
    GROUP BY tumblingwindow(minute,1), fqn
    

    You may wonder can we just combine above two jobs as multiple steps in a single job and avoid the staging Event Hub. Unfortunately, you cannot use "timestamp by" when you select from CTE or subquery today.