Search code examples
jsonazureflattendataflowazure-synapse

Azure Synapse Analytics Json Flatten


I'm new to Azure Synapse and currently have the following problem:

I get a json that looks like the following:

{
"2022-02-01":[
    {
      "shiftId": ,
      "employeeId": ,
      "duration": ""
    },
    {
      "shiftId": ,
      "employeeId": ,
      "duration": ""
    }
],
  "2022-02-02": [
    {
      "shiftId": ,
      "employeeId": ,
      "duration": ""
    }
],
"2022-02-03": [
    {
      "shiftId": ,
      "employeeId": ,
      "duration": ""
    },
    {
      "shiftId": ,
      "employeeId": ,
      "duration": ""
    }
],
  "2022-02-4": []
}

Now I would like to convert this so that I get it in a view. I have already tried with a dataflow as array of documents but I get an error.

"Malformed records are detected in schema inference. Parse Mode: FAILFAST"

I want something like:

date         shiftId   employeeId   duration
___________|_________|____________|_________
2022-02-01 | 1234    | 345345     | 420
2022-02-01 | 2345    | 345345     | 124
2022-02-02 | 5345    | 123567     | 424
2022-02-03 | 5675    | 987542     | 123
2022-02-03 | 9456    | 234466     | 754

Solution

  • Azure Synapse Analytics, dedicated SQL pools are actually very capable with JSON, supporting OPENJSON and JSON_VALUE, so you could just use a Stored Procedure with the JSON as a parameter. A simple exmaple:

    SELECT
        k.[key] AS [shiftDate],
        JSON_VALUE( d.[value], '$.shiftId' ) shiftId,
        JSON_VALUE( d.[value], '$.employeeId' ) employeeId,
        JSON_VALUE( d.[value], '$.duration' ) duration
    FROM OPENJSON( @json, '$' ) k
        CROSS APPLY OPENJSON( k.value, '$' ) d;
    

    The full code:

    DECLARE @json NVARCHAR(MAX) = '{
        "2022-02-01": [
            {
                "shiftId": 1234,
                "employeeId": 345345,
                "duration": 420
            },
            {
                "shiftId": 2345,
                "employeeId": 345345,
                "duration": 124
            }
        ],
        "2022-02-02": [
            {
                "shiftId": 5345,
                "employeeId": 123567,
                "duration": 424
            }
        ],
        "2022-02-03": [
            {
                "shiftId": 5675,
                "employeeId": 987542,
                "duration": 123
            },
            {
                "shiftId": 9456,
                "employeeId": 234466,
                "duration": 754
            }
        ]
    }'
    
    
    SELECT
        k.[key] AS [shiftDate],
        JSON_VALUE( d.[value], '$.shiftId' ) shiftId,
        JSON_VALUE( d.[value], '$.employeeId' ) employeeId,
        JSON_VALUE( d.[value], '$.duration' ) duration
    FROM OPENJSON( @json, '$' ) k
        CROSS APPLY OPENJSON( k.value, '$' ) d;
    

    My results:

    enter image description here

    You could use a Synapse Notebook or Mapping Data Flows if you wanted something more dynamic.