Search code examples
jsonmultidimensional-arrayazure-data-lakeu-sql

USql Call data in multidimensional JSON array


I have this JSON file in a data lake that looks like this:

{
"id":"398507",
"contenttype":"POST",
"posttype":"post",
"uri":"http://twitter.com/etc",
"title":null,
"profile":{
   "@class":"PublisherV2_0",
   "name":"Company",
   "id":"2163171",
   "profileIcon":"https://pbs.twimg.com/image",
   "profileLocation":{
      "@class":"DocumentLocation",
      "locality":"Toronto",
      "adminDistrict":"ON",
      "countryRegion":"Canada",
      "coordinates":{
         "latitude":43.7217,
         "longitude":-31.432},
         "quadKey":"000000000000000"},
      "displayName":"Name",
      "externalId":"00000000000"},
   "source":{
       "name":"blogs",
       "id":"18",
       "param":"Twitter"},
   "content":{
       "text":"Description of post"},
       "language":{
           "name":"English",
           "code":"en"},
       "abstracttext":"More Text and links",
       "score":{}
   }
}

in order to call the data into my application, I have to turn the JSON into a string using this code:

DECLARE @input string = @"/MSEStream/{*}.json";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];


@allposts = 
EXTRACT 
    jsonString  string
FROM @input
USING Extractors.Text(delimiter:'\b', quoting:true);

@extractedrows = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS er FROM @allposts;


@result = 
SELECT er["id"] AS postID,
       er["contenttype"] AS contentType,
       er["posttype"] AS postType,
       er["uri"] AS uri,
       er["title"] AS Title,
       er["acquisitiondate"] AS acquisitionDate,
       er["modificationdate"] AS modificationDate,
       er["publicationdate"] AS publicationDate,
       er["profile"] AS profile
FROM @extractedrows;

OUTPUT  @result
TO  "/ProcessedQueries/all_posts.csv"
USING Outputters.Csv();

This output the JSON into a .csv file that is readable and when I download the file all data is displayed properly. My problem is when I need to get the data inside profile. Because the JSON is now a string I can't seem to extract any of that data and put it into a variable to use. Is there any way to do this? or do I need to look into other options for reading the data?


Solution

  • You can use JsonTuple on the profile string to further extract the specific properties you want. An example of U-SQL code to process nested Json is provided in this link - https://github.com/Azure/usql/blob/master/Examples/JsonSample/JsonSample/NestedJsonParsing.usql.

    You can use JsonTuple on the profile column to further extract specific nodes

    E.g. use JsonTuple to get all the child nodes of the profile node and extract specific values like how you did in your code.

    @childnodesofprofile = 
    SELECT
       Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile) AS childnodes_map
    FROM @result;
    
    @values =
    SELECT
       childnodes_map["name"] AS name,
       childnodes_map["id"] AS id
    FROM @result;
    

    Alternatively, if you are interested in specific values, you can also pass paramters to the JsonTuple function to get the specific nodes you want. The code below gets the locality node from the recursively nested nodes (as described by the "$..value" construct.

    @locality = 
    SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile, "$..locality").Values AS locality
    FROM @result;
    

    Other supported constructs by JsonTuple

        JsonTuple(json, "id", "name")              // field names          
        JsonTuple(json, "$.address.zip")           // nested fields        
        JsonTuple(json, "$..address")              // recursive children   
        JsonTuple(json, "$[?(@.id > 1)].id")       // path expression      
        JsonTuple(json)                            // all children
    

    Hope this helps.