I have this JSON file in a data lake that looks like this:
{
"id":"398507",
"contenttype":"POST",
"posttype":"post",
"uri":"http://twitter.com/etc",
"title":null,
"profile":{
"@class":"PublisherV2_0",
"name":"Company",
"id":"2163171",
"profileIcon":"https://pbs.twimg.com/image",
"profileLocation":{
"@class":"DocumentLocation",
"locality":"Toronto",
"adminDistrict":"ON",
"countryRegion":"Canada",
"coordinates":{
"latitude":43.7217,
"longitude":-31.432},
"quadKey":"000000000000000"},
"displayName":"Name",
"externalId":"00000000000"},
"source":{
"name":"blogs",
"id":"18",
"param":"Twitter"},
"content":{
"text":"Description of post"},
"language":{
"name":"English",
"code":"en"},
"abstracttext":"More Text and links",
"score":{}
}
}
in order to call the data into my application, I have to turn the JSON into a string using this code:
DECLARE @input string = @"/MSEStream/{*}.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
@allposts =
EXTRACT
jsonString string
FROM @input
USING Extractors.Text(delimiter:'\b', quoting:true);
@extractedrows = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS er FROM @allposts;
@result =
SELECT er["id"] AS postID,
er["contenttype"] AS contentType,
er["posttype"] AS postType,
er["uri"] AS uri,
er["title"] AS Title,
er["acquisitiondate"] AS acquisitionDate,
er["modificationdate"] AS modificationDate,
er["publicationdate"] AS publicationDate,
er["profile"] AS profile
FROM @extractedrows;
OUTPUT @result
TO "/ProcessedQueries/all_posts.csv"
USING Outputters.Csv();
This output the JSON into a .csv file that is readable and when I download the file all data is displayed properly. My problem is when I need to get the data inside profile. Because the JSON is now a string I can't seem to extract any of that data and put it into a variable to use. Is there any way to do this? or do I need to look into other options for reading the data?
You can use JsonTuple on the profile string to further extract the specific properties you want. An example of U-SQL code to process nested Json is provided in this link - https://github.com/Azure/usql/blob/master/Examples/JsonSample/JsonSample/NestedJsonParsing.usql.
You can use JsonTuple on the profile column to further extract specific nodes
E.g. use JsonTuple to get all the child nodes of the profile node and extract specific values like how you did in your code.
@childnodesofprofile =
SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile) AS childnodes_map
FROM @result;
@values =
SELECT
childnodes_map["name"] AS name,
childnodes_map["id"] AS id
FROM @result;
Alternatively, if you are interested in specific values, you can also pass paramters to the JsonTuple function to get the specific nodes you want. The code below gets the locality node from the recursively nested nodes (as described by the "$..value" construct.
@locality =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile, "$..locality").Values AS locality
FROM @result;
Other supported constructs by JsonTuple
JsonTuple(json, "id", "name") // field names
JsonTuple(json, "$.address.zip") // nested fields
JsonTuple(json, "$..address") // recursive children
JsonTuple(json, "$[?(@.id > 1)].id") // path expression
JsonTuple(json) // all children
Hope this helps.