Search code examples
jsonazureu-sql

U-SQL - Extract data from complex nested json file


My json structure is the following:

{
"First":"xxxx",
"Country":"XX",
"Loop": {
    "Links": [
        {
            "Url":"xxxx",
            "Time":123
        }, {
            "Url":"xxxx",
            "Time":123
        }],
    "TotalTime":123,
    "Date":"2018-04-09T10:29:39.0233082+00:00"
}

I'd like to extract the properties

First
Country
Url & Time foreach object in the array
TotalTime
Date

This is my query

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

@extration = 
EXTRACT 
    jsonString string 
FROM @"/storage-api/input.json" 
USING Extractors.Tsv(quoting:false);

@cleanUp = SELECT jsonString FROM @extration WHERE (!jsonString.Contains("Part: h" ) AND jsonString!= "465}");

@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS obj FROM @cleanUp;

@columnized = SELECT 
        obj["First"] AS first,
        obj["Country"] AS country
FROM @jsonify;

OUTPUT @columnized
TO @"/storage-api/outputs/tpe1-output.csv"
USING Outputters.Csv();

But this query only extract the first 2 properties, I don't know how to query nested data inside "Loop"


Solution

  • You can do that using the MultiLevelJsonExtractor (notes here) and a JSON path eg Loop.Links[*]. MultiLevelJsonExtractor has a nice feature where if your node isn't found at your base path it will check recursively for it, although I'm not sure how the performance would scale on a large JSON documents or a large number of JSON documents.

    Try this:

    DECLARE @input string = "/input/input65.json";
    
    REFERENCE ASSEMBLY [Newtonsoft.Json];
    REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 
    
    USING Microsoft.Analytics.Samples.Formats.Json;
    
    @result =
        EXTRACT First string,
                Country string,
                Date DateTime,
                Url string,
                Time string,
                TotalTime int
        FROM @input
        USING new MultiLevelJsonExtractor("Loop.Links[*]",
    
              false,
              "First",
              "Country",
              "Date",
              "Url",
              "Time",
              "TotalTime"
              );
    
    
    OUTPUT @result
    TO "/output/output.csv"
    USING Outputters.Csv();
    

    My results:

    Results

    HTH