Search code examples
azureazure-stream-analytics

Join in Stream Analytics with an element in an Array


I am trying to do a join in stream analytics with a reference data.

Below is the input from stream data.

[{
               "id":"111111101",
               "basetime":0,
               "xyz":
               [
                              {
                                             "xxx":1,
                                             "yyy":2631,
                                             "aaa":"470A01",
                                             "id":1
                              },
                              {
                                             "xxx":0,
                                             "yyy":0,
                                             "aaa":"000000",
                                             "id":61
                              }
               ]

},
{
               "id":"111111102",
               "basetime":0,
               "xyz":
               [
                              {
                                             "xxx":1,
                                             "yyy":2631,
                                             "aaa":"03F4EB",
                                             "id":1
                              }
               ]
},
{
               "id":"111111103",
               "basetime":0,
               "xyz":
               [
                              {
                                             "xxx":1,
                                             "yyy":2631,
                                             "aaa":"6706",
                                             "id":1
                              }
              ]
}
]

Below is the reference Master data.

[
{
    "aaa": "470A01"
  },
  {
    "aaa": "03F4EB"
  },
  {
    "aaa": "710211"
  }
]

The SAQL written is as shown below.

WITH INPUT1 AS (
    SELECT
    input.id.dateTime AS ID,
    flatArrayElement as ABC,
    FROM [signals2] as input
    CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
    )

I have used CROSS APPLY to get each element in xyz as different rows.

The output of INPUT1 is as shown below.

+----------+------------------------------------------------------------------------+
|       ID |   ABC                                                                  |
+----------+------------------------------------------------------------------------+
| 111111101| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} |
| 111111101| {"ArrayValue":{"xxx":0,"yyy":0,"aaa":000000,"id":61},"ArrayIndex":1}   |
| 111111102| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} |
| 111111103| {"ArrayValue":{"xxx":1,"yyy":2631,"aaa":6706,"id":1},"ArrayIndex":0}   |
+-------------------+---------------------------------------------------------------+

Now I am trying to join the data xyz.aaa with a reference data as below, where master is the reference data.

SIGNALS AS (
 SELECT * FROM INPUT1 I JOIN master M ON I.ABC.ArrayValue.aaa = M.aaa

I am getting below output, but the problem is xyz with more than one element is duplicating in the output.

+-------------------------------+------------------------------------------------------------------------------------------+--------+
| i___timestamp                 |   i                                                                                      |  m     |     
+-------------------------------+------------------------------------------------------------------------------------------+--------+
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111101",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":470A01,"id":1},"ArrayIndex":0} | 470A01 |
| "2019-11-13T03:36:22.4636494Z"| "id": "111111102",{"ArrayValue":{"xxx":1,"yyy":2631,"aaa":03F4EB,"id":1},"ArrayIndex":0} | 03F4EB |
+-------------------------------+------------------------------------------------------------------------------------------+--------+

I am confused why the first two rows are duplicating,it should be only one entry for that. Out of two elements in xyz one is valid and one is invalid. But here the valid element is repeating two times. What might be the reason? How to resolve this ?


Solution

  • Please see my query sql:

    WITH INPUT1 AS (
        SELECT
            input.name as name,
            flatArrayElement as ABC
        FROM
            [YourInputAlias] as input
            CROSS APPLY GetArrayElements(input.xyz) AS flatArrayElement
    )
    
    SELECT INPUT1.ABC.ArrayValue.aaa FROM INPUT1
    JOIN jayrefer on INPUT1.ABC.ArrayValue.aaa = jayrefer.item
    

    jayrefer is your Master Reference:

    enter image description here

    input is your Input:

    enter image description here

    Output:

    enter image description here


    Just for summary,the issue is caused by the duplicate raws in the reference data so that the final results is duplicate.

    The final sql is:

    WITH INPUT AS (
        SELECT
        input1.id.dateTime AS ID,
        flatArrayElement as ABC
        FROM input1
        CROSS APPLY GetArrayElements(input1.xyz) AS flatArrayElement
    )
    SELECT * FROM INPUT I JOIN jayrefer M ON I.ABC.ArrayValue.aaa = M.aaa