My json structure is the following:
{
"First":"xxxx",
"Country":"XX",
"Loop": {
"Links": [
{
"Url":"xxxx",
"Time":123
}, {
"Url":"xxxx",
"Time":123
}],
"TotalTime":123,
"Date":"2018-04-09T10:29:39.0233082+00:00"
}
I'd like to extract the properties
First
Country
Url & Time foreach object in the array
TotalTime
Date
This is my query
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
@extration =
EXTRACT
jsonString string
FROM @"/storage-api/input.json"
USING Extractors.Tsv(quoting:false);
@cleanUp = SELECT jsonString FROM @extration WHERE (!jsonString.Contains("Part: h" ) AND jsonString!= "465}");
@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS obj FROM @cleanUp;
@columnized = SELECT
obj["First"] AS first,
obj["Country"] AS country
FROM @jsonify;
OUTPUT @columnized
TO @"/storage-api/outputs/tpe1-output.csv"
USING Outputters.Csv();
But this query only extract the first 2 properties, I don't know how to query nested data inside "Loop"
You can do that using the MultiLevelJsonExtractor
(notes here) and a JSON path eg Loop.Links[*]
. MultiLevelJsonExtractor
has a nice feature where if your node isn't found at your base path it will check recursively for it, although I'm not sure how the performance would scale on a large JSON documents or a large number of JSON documents.
Try this:
DECLARE @input string = "/input/input65.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
@result =
EXTRACT First string,
Country string,
Date DateTime,
Url string,
Time string,
TotalTime int
FROM @input
USING new MultiLevelJsonExtractor("Loop.Links[*]",
false,
"First",
"Country",
"Date",
"Url",
"Time",
"TotalTime"
);
OUTPUT @result
TO "/output/output.csv"
USING Outputters.Csv();
My results:
HTH