Search code examples
sql-serverazure-application-insightsazure-stream-analytics

How to merge multiple records with strings and nulls in a Stream Analytics Group By


I am trying to pull some logged events from Application Insights into our SQL database. I have no control over the format of the inputs which are json files composed of multiple json arrays within the file. In each record, 5 pieces of information are in a json array at [context].[custom].[dimensions] in the file and using an OUTER APPLY flattens these values. The problem is it returns results not as one row per record but as though you had joined one row with 5 (which is indeed what it has done) and the values of the 5 pieces of data are NULL in 4 cases and the actual value in the other. I only need 2 of the 5 values - PageType and UserId - and given this in my GROUP BY it returns 3 records, one with each value and one with both of them null.

In normal SQL you would simply use a MAX expression to get the real values for each but in Stream Analytics you can't use MAX on strings. You also can't use COALESCE and a number of other methods I tried to resolve this with. Any ideas how the results can be changed from:

EventDateTime  Event      PageType UserId  AppVersion CountA
2017-05-24     Nav Show   NULL     NULL    2.0.1293     1
2017-05-24     Nav Show   NULL     SIRTSW  2.0.1293     1
2017-05-24     Nav Show   Trade    NULL    2.0.1293     1

to

2017-05-24     Nav Show   Trade    SIRTSW  2.0.1293     1  ?

The code that returns three rows for each is as follows (note that e.event is an array of one item so it does not cause the same issue):

SELECT flatEvent.ArrayValue.name as Event, 
e.context.data.eventTime as EventDateTime,
e.context.application.version as AppVersion 
,flatCustom.ArrayValue.UserId as UserId
,flatCustom.ArrayValue.PageType as PageType, 
SUM(flatEvent.ArrayValue.count) as CountA
INTO
      [insights] 
    FROM [ios] e
    CROSS APPLY GetArrayElements(e.[event]) as flatEvent
    OUTER APPLY GetArrayElements(e.[context].[custom].[dimensions]) as flatCustom
    GROUP BY SlidingWindow(minute, 1),
    flatEvent.ArrayValue.name,
    e.context.data.eventTime,
    e.context.application.version,
    flatCustom.ArrayValue.UserId,
    flatCustom.ArrayValue.PageType

Thanks in advance, Rob


Solution

  • According to your scenario, I assumed that you could use JavaScript user-defined functions for Azure Stream Analytics to coalesce the multiple dimensions into a single record. Here are my test for this issue, you could refer to them.

    JSON file

    {
      "context":{
         "data":{"eventTime":"2017-05-24"},
         "application":{"version":"2.0.1293"},
         "custom":{
            "dimensions":[
               {"PageType":null,"UserId":"SIRTSW"},
               {"PageType":"Trade","UserId":null},
               {"PageType":null,"UserId":null}
            ]
         }
      },
      "event":[
        {"name":"Nav Show","count":1}
      ]
    }
    

    javascript UDF, UDF.coalesce

    function main(items) {
        var result=[];
        var UserIdStr="",PageTypeStr="";
        for(var i=0;i<items.length;i++){
            if(items[i].UserId!=null && items[i].UserId!=undefined)
             UserIdStr+=items[i].UserId;
            if(items[i].PageType!=null && items[i].PageType!=undefined)
             PageTypeStr+=items[i].PageType;
        }
        result.push({UserId:UserIdStr,PageType:PageTypeStr});
        return result;
    }
    

    Query

    --first query
    WITH f AS (
    SELECT 
    e.context.data.eventTime as EventDateTime,
    e.context.application.version as AppVersion,
    e.event as flatEvent,
    UDF.coalesce(e.[context].[custom].[dimensions]) as flatDimensions
        FROM [ios] e
    )
    
    --second query
    SELECT flatEvent.ArrayValue.name as Event,
    f.EventDateTime,
    f.AppVersion,
    flatDimension.ArrayValue.UserId,
    flatDimension.ArrayValue.PageType,
    SUM(flatEvent.ArrayValue.count) as CountA
    FROM f
    CROSS APPLY GetArrayElements(f.[flatEvent]) as flatEvent
    OUTER APPLY GetArrayElements(f.[flatDimensions]) as flatDimension
    GROUP BY SlidingWindow(minute, 1),
        flatEvent.ArrayValue.name,
        f.EventDateTime,
        f.AppVersion,
        flatDimension.ArrayValue.UserId,
        flatDimension.ArrayValue.PageType
    

    TEST RESULT enter image description here