I am trying to do a join in stream analytics with a reference data.
Below is the input from stream data.
[{
"id":"111111101",
"basetime":0,
"xyz":
[
{
"xxx":1,
"yyy":2631,
"aaa":"470A01",
"id":1
},
{
"xxx":0,
"yyy":0,
"aaa":"000000",
"id":61
}
]
},
{
"id":"111111102",
"basetime":0,
"xyz":
[
{
"xxx":1,
"yyy":2631,
"aaa":"03F4EB",
"id":1
}
]
},
{
"id":"111111103",
"basetime":0,
"xyz":
[
{
"xxx":1,
"yyy":2631,
"aaa":"6706",
"id":1
}
]
}
]
Below is the reference Master data.
[
{
"aaa": "470A01"
},
{
"aaa": "03F4EB"
},
{
"aaa": "710211"
}
]
The SAQL written is as shown below.
WITH INPUT1 AS (
SELECT
input.id.dateTime AS ID,
flatArrayElement as ABC,
FROM [signals2] as input
CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
)
I have used CROSS APPLY to get each element in xyz as different rows.
The output of INPUT1 is as shown below.
+----------+------------------------------------------------------------------------+
| ID | ABC |
+----------+------------------------------------------------------------------------+
| 111111101| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} |
| 111111101| {"ArrayValue":{"xxx":0,"yyy":0,"aaa":000000,"id":61},"ArrayIndex":1} |
| 111111102| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} |
| 111111103| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":6706,"id":1},"ArrayIndex":0} |
+-------------------+---------------------------------------------------------------+
Now I am trying to join the data xyz.aaa with a reference data as below, where master is the reference data.
SIGNALS AS (
SELECT * FROM INPUT1 I JOIN master M ON I.ABC.ArrayValue.aaa = M.aaa
I am getting below output, but the problem is xyz with more than one element is duplicating in the output.
+-------------------------------+------------------------------------------------------------------------------------------+--------+
| i___timestamp | i | m |
+-------------------------------+------------------------------------------------------------------------------------------+--------+
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111102",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} | 03F4EB |
+-------------------------------+------------------------------------------------------------------------------------------+--------+
I am confused why the first two rows are duplicating,it should be only one entry for that. Out of two elements in xyz one is valid and one is invalid. But here the valid element is repeating two times. What might be the reason? How to resolve this ?
Please see my query sql:
WITH INPUT1 AS (
SELECT
input.name as name,
flatArrayElement as ABC
FROM
[YourInputAlias] as input
CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
)
SELECT INPUT1.ABC.ArrayValue.aaa FROM INPUT1
JOIN jayrefer on INPUT1.ABC.ArrayValue.aaa = jayrefer.item
jayrefer is your Master Reference:
input is your Input:
Output:
Just for summary,the issue is caused by the duplicate raws in the reference data
so that the final results is duplicate.
The final sql is:
WITH INPUT AS (
SELECT
input1.id.dateTime AS ID,
flatArrayElement as ABC
FROM input1
CROSS APPLY GetArrayElements(input1.xyz) AS flatArrayElement
)
SELECT * FROM INPUT I JOIN jayrefer M ON I.ABC.ArrayValue.aaa = M.aaa