I am trying to pull some logged events from Application Insights into our SQL database. I have no control over the format of the inputs which are json files composed of multiple json arrays within the file. In each record, 5 pieces of information are in a json array at [context].[custom].[dimensions] in the file and using an OUTER APPLY flattens these values. The problem is it returns results not as one row per record but as though you had joined one row with 5 (which is indeed what it has done) and the values of the 5 pieces of data are NULL in 4 cases and the actual value in the other. I only need 2 of the 5 values - PageType and UserId - and given this in my GROUP BY it returns 3 records, one with each value and one with both of them null.
In normal SQL you would simply use a MAX expression to get the real values for each but in Stream Analytics you can't use MAX on strings. You also can't use COALESCE and a number of other methods I tried to resolve this with. Any ideas how the results can be changed from:
EventDateTime Event PageType UserId AppVersion CountA
2017-05-24 Nav Show NULL NULL 2.0.1293 1
2017-05-24 Nav Show NULL SIRTSW 2.0.1293 1
2017-05-24 Nav Show Trade NULL 2.0.1293 1
to
2017-05-24 Nav Show Trade SIRTSW 2.0.1293 1 ?
The code that returns three rows for each is as follows (note that e.event is an array of one item so it does not cause the same issue):
SELECT flatEvent.ArrayValue.name as Event,
e.context.data.eventTime as EventDateTime,
e.context.application.version as AppVersion
,flatCustom.ArrayValue.UserId as UserId
,flatCustom.ArrayValue.PageType as PageType,
SUM(flatEvent.ArrayValue.count) as CountA
INTO
[insights]
FROM [ios] e
CROSS APPLY GetArrayElements(e.[event]) as flatEvent
OUTER APPLY GetArrayElements(e.[context].[custom].[dimensions]) as flatCustom
GROUP BY SlidingWindow(minute, 1),
flatEvent.ArrayValue.name,
e.context.data.eventTime,
e.context.application.version,
flatCustom.ArrayValue.UserId,
flatCustom.ArrayValue.PageType
Thanks in advance, Rob
According to your scenario, I assumed that you could use JavaScript user-defined functions for Azure Stream Analytics to coalesce the multiple dimensions into a single record. Here are my test for this issue, you could refer to them.
JSON file
{
"context":{
"data":{"eventTime":"2017-05-24"},
"application":{"version":"2.0.1293"},
"custom":{
"dimensions":[
{"PageType":null,"UserId":"SIRTSW"},
{"PageType":"Trade","UserId":null},
{"PageType":null,"UserId":null}
]
}
},
"event":[
{"name":"Nav Show","count":1}
]
}
javascript UDF, UDF.coalesce
function main(items) {
var result=[];
var UserIdStr="",PageTypeStr="";
for(var i=0;i<items.length;i++){
if(items[i].UserId!=null && items[i].UserId!=undefined)
UserIdStr+=items[i].UserId;
if(items[i].PageType!=null && items[i].PageType!=undefined)
PageTypeStr+=items[i].PageType;
}
result.push({UserId:UserIdStr,PageType:PageTypeStr});
return result;
}
Query
--first query
WITH f AS (
SELECT
e.context.data.eventTime as EventDateTime,
e.context.application.version as AppVersion,
e.event as flatEvent,
UDF.coalesce(e.[context].[custom].[dimensions]) as flatDimensions
FROM [ios] e
)
--second query
SELECT flatEvent.ArrayValue.name as Event,
f.EventDateTime,
f.AppVersion,
flatDimension.ArrayValue.UserId,
flatDimension.ArrayValue.PageType,
SUM(flatEvent.ArrayValue.count) as CountA
FROM f
CROSS APPLY GetArrayElements(f.[flatEvent]) as flatEvent
OUTER APPLY GetArrayElements(f.[flatDimensions]) as flatDimension
GROUP BY SlidingWindow(minute, 1),
flatEvent.ArrayValue.name,
f.EventDateTime,
f.AppVersion,
flatDimension.ArrayValue.UserId,
flatDimension.ArrayValue.PageType