I have a JSON document that looks as follows:
{
"Region": "Main",
"MarketLocations": [
{
"MarketName": "Central",
"MarketId": 1,
"SalesCategories": {
"Produce": [
{
"Type": "Apple",
"Name": "Granny Smith",
"DatePicked": "2016-11-08T14:14:33.712Z",
"ShelfLifeInDays": 24,
"Calories": 45,
"Price": 0.29
}
],
"BakedGoods": [
{
"DateMade": "2016-11-08T14:14:33.712Z",
"Name": "Apple Pie",
"Price": 14.25
}
],
"RestaurantItems": [
{
"Name": "Turkey Sandwich",
"Price": 4.85,
"PreparationTimeInMinutes": 20
}
],
"NonPerishable": [
{
"Name": "Honey Mustard",
"Type": "Condiments"
}
]
}
},
{
"MarketName": "Southern",
"MarketId": 2,
"SalesCategories": {
"Produce": [
{
"Type": "Apple",
"Name": "Granny Smith",
"DatePicked": "2016-11-08T14:14:33.712Z",
"ShelfLifeInDays": 24,
"Calories": 45,
"Price": 0.29
},
{
"Type": "Plums",
"Name": "Red Plums",
"DatePicked": "2016-11-08T14:14:33.712Z",
"ShelfLifeInDays": 12,
"Calories": 21,
"Price": 0.33
},
{
"Type": "Pears",
"Name": "Golden Nature",
"DatePicked": "2016-11-08T14:14:33.712Z",
"ShelfLifeInDays": 20,
"Calories": 40,
"Price": 0.45
}
],
"BakedGoods": [
{
"DateMade": "2016-11-08T14:14:33.712Z",
"Name": "Apple Pie",
"Price": 14.25
}
],
"RestaurantItems": [
{
"Name": "Turkey Sandwich",
"Price": 4.85,
"PreparationTimeInMinutes": 20
}
],
"NonPerishable": [
{
"Name": "Honey Mustard",
"Type": "Condiments"
}
]
}
},
{
"MarketName": "Western",
"MarketId": 3,
"SalesCategories": {
"Produce": [
{
"Type": "Plums",
"Name": "Red Plums",
"DatePicked": "2016-11-08T14:14:33.712Z",
"ShelfLifeInDays": 12,
"Calories": 21,
"Price": 0.33
},
{
"Type": "Pears",
"Name": "Golden Nature",
"DatePicked": "2016-11-08T14:14:33.712Z",
"ShelfLifeInDays": 20,
"Calories": 40,
"Price": 0.45
}
],
"BakedGoods": [
{
"DateMade": "2016-11-08T14:14:33.712Z",
"Name": "Plum Pie",
"Price": 18.25
}
],
"RestaurantItems": [
{
"Name": "Ham Sandwich",
"Price": 4.85,
"PreparationTimeInMinutes": 20
},
{
"Name": "Chicken Soup",
"Price": 2.25,
"PreparationTimeInMinutes": 5
}
],
"NonPerishable": [
{
"Name": "Mayo",
"Type": "Condiments"
},
{
"Name": "Syrup",
"Type": "Condiments"
},
{
"Name": "Ginger",
"Type": "Spices"
}
]
}
}
]
}
I have the following U-SQL, that processes this JSON file, running inside Visual Studio:
DECLARE @in string=@"/JsonDoc2.json";
DECLARE @out string=@"Output/JsonDoc2.csv";
@produce =
EXTRACT Name string,
DatePicked DateTime,
ShelfLifeInDays int,
Calories int,
Price decimal,
MarketId string,
MarketName string
FROM @in
USING new MultiLevelJsonExtractor("MarketLocations[*].SalesCategories.Produce[*]",
false,
"Name",
"DatePicked",
"ShelfLifeInDays",
"Calories",
"Price",
"MarketId",
"MarketName");
OUTPUT @produce
TO @out
USING Outputters.Csv(outputHeader : true);
This executes without error. The problem is that I am specifically specifying what sales category I want ('produce'). I'd like to change this query so that that all sales categories are included (produce, baked goods etc.) with the category name included. I've not been able to figure out a way to do this.
The JsonType
method of the NewtonSoft JsonFunctions class, returns a MAP
value which is a key-value pair. You can then reference the key to get the JSON property / object / array names, at least after a few other manipulations with CROSS APPLY
and EXPLODE
.
For your example, I got the following to work:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE @input string = @"/input/myinputfile.json";
DECLARE @output string = @"output/output.csv";
@json =
EXTRACT Region string,
MarketName string,
SalesCategories string // get the SalesCategories as JSON
FROM @input
USING new MultiLevelJsonExtractor("MarketLocations[*].SalesCategories",
true,
"Region",
"MarketName",
"SalesCategories"
);
// Convert the json string to tuple/MAP
@working =
SELECT Region,
MarketName,
JsonFunctions.JsonTuple(SalesCategories) AS x
FROM @json;
// Explode the tuple as key-value pair;
@working =
SELECT Region,
MarketName,
key,
value
FROM @working
CROSS APPLY
EXPLODE(x) AS y(key, value);
// Explode the value which is JSON
@working =
SELECT Region,
MarketName,
key,
JsonFunctions.JsonTuple(y) AS z
FROM @working
CROSS APPLY
EXPLODE(JsonFunctions.JsonTuple(value).Values) AS x(y);
// Prep the result, naming the items you want
@result =
SELECT Region,
MarketName,
key,
z["Type"] AS Type,
z["Name"] AS Name,
z["DatePicked"] AS DatePicked,
z["ShelfLifeInDays"] AS ShelfLifeInDays,
z["Calories"] AS Calories,
z["Price"] AS Price,
z["DateMade"] AS DateMade,
z["PreparationTimeInMinutes"] AS PreparationTimeInMinutes
FROM @working;
OUTPUT @result
TO @output
USING Outputters.Csv(quoting:false);
It feels like it could be simplified, but see how you get on. The samples for shredding JSON are in short supply but try here and here.