Search code examples
jsonazureazure-data-lakeu-sqldata-lake

How to use JSON file formats in the context of Azure Data Lake Analytics respectively usql


I have a JSON input that looks like

{
    "sessionId": 1234,
    "deviceId": "MAC:1234",
    "IoTHub": {
        "MessageId": "1234-1234-1234-1234"
    }
}

How can I extract the values of sessionId, deviceId and MessageId in a Azure Datalake Analytics usql script?


Solution

  • How to use JSON file formats in the context of Azure Data Lake Analytics respectively usql

    Set Up process

    • Download the repo from [1]
      • Open the solution in .\Examples\DataFormats\Microsoft.Analytics.Samples.sln
      • Build the solution
      • Get .\Examples\DataFormats\Microsoft.Analytics.Samples.Formats\bin\Debug\Microsoft.Analytics.Samples.Formats.dll
      • Get .\Examples\DataFormats\Microsoft.Analytics.Samples.Formats\bin\Debug\Newtonsoft.Json.dll
    • In ADLS create a folder where to store the assemblies (e.g. .\assemblies)
      • Create a sample file in the folder if it disappears (happened to me)
    • With Visual Studio:
      • Add both files to the folder of the ADLA solution (e.g. .\lib\...)
      • Open Cloud Explorer, navigate to the ADLA Database -> Assemblies -> right click and register the assemblies
        • Store the assemblies in the folder created before in ADLS
    • Reference the assemblies as can be seen below

    Usage

    JSON Input Data

    • Use the Extractor as e.g. shown below
      • For more information see [2] and [3]

    Resources

    [1] GitHub Azure USQL

    [2] GitHub Azure USQL DataFormats

    [3] U-SQL - Extract data from json-array


    U-SQL Script

    DECLARE @localDevelopment bool = true;
    
    IF @localDevelopment == true THEN
        DROP ASSEMBLY IF EXISTS [Newtonsoft.Json];
        DROP ASSEMBLY IF EXISTS [Microsoft.Analytics.Samples.Formats];
        CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/lib/Newtonsoft.Json.dll";
        CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM @"/lib/Microsoft.Analytics.Samples.Formats.dll";
        DECLARE @input string = @"/data/input.json";
        DECLARE @output string = @"/data/output.csv";
    ELSE
        DECLARE @input string = @"/data/input.json";
        DECLARE @output string = @"/data/output.csv";
    END;
    
    REFERENCE ASSEMBLY [Newtonsoft.Json];
    REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 
    
    //Define schema of file, must map all columns
    //Names must match keys
    @extractDataFirstLevel = 
        EXTRACT sessionId int,
                deviceId string,
                IoTHub string
                //Date DateTime
        FROM @input
        USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
    
    @selectData =
        SELECT sessionId,
               deviceId,
               Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(IoTHub)["MessageId"] AS messageId
        FROM @extractDataFirstLevel;
    
    OUTPUT @selectData
    TO @output
    USING Outputters.Csv();