Search code examples
jsonazureazure-data-lakeu-sql

USQL call to read json array without a key


I have a nested json that I am trying to flatten in usql. I cannot share the data, but the structure is similar to this.

{
    "userlist": [user1, user1],
    "objects": {
        "largeobjects": [object1, object2, object3]
        "smallobjects": [s_object1, s_object2]
    },
    "applications": [{
            "application": sdq3ds5dsa
        }, {
            "application": dksah122j4
        }, {
            "application": sadsw2dq2s
        }, {
            "application": pro3dfdsn3
        }
    ],
    "date" : 12344232,
    "timezone" : "Asia",
    "id" : "sad2ddssa2",
    "admin": {
        "lang": "eng",
        "country": "us",
    }
}

I am using the custom jsonoutputter (https://github.com/Azure/usql/tree/master/Examples/DataFormats/Microsoft.Analytics.Samples.Formats) to extract from the json file and the jsontuple function to extract the values. My problem is that the function uses sql map which generates key value pairs. This works for the situations where I have a key, but it throws an error when I try to use that function to get the values from the no key array.

Any suggestion regarding how to solve this would be greatly appreciated.

EDIT This is the output I am looking after:

sad2ddssa2, object1, 12344232, "Asia", "eng", "us",
sad2ddssa2, object2, 12344232, "Asia", "eng", "us"


Solution

  • First option

    Try to use PROSE within your u-sql. Use the PROSE's c# nuget to process data and do complex extractions. This is a very powerful AI package. See the videos and examples here: https://microsoft.github.io/prose

    Second option

    Create a c# function to process your json. Something like this one, adapt this sample to your custom extraction request using c# json api's:

    /* Formats the array of values into a named json array. */
    
    DECLARE @JsonArray Func<SqlArray<string>, string, string> = (data, name) => 
        {
            StringBuilder buffer = new StringBuilder();
    
            buffer.Append("{\r\n\t\"" + name + "\": [\r\n");
    
            for (int i = 0; i < data.Count(); i++)
            {
                if (i > 0)
                {
                    buffer.Append(",\r\n");
                }
    
                buffer.Append("\t\"" + data[i] + "\"");
            }
    
            buffer.Append("\r\n\t]\r\n}");
    
            return buffer.ToString();
        };
    
    /* Format the array containing groups of comma separated values into a named json array */
    
    @Query = 
        SELECT
            @JsonArray(SubscriptionArray, "subscriptionList") AS JsonArray
        FROM @subscriptionsQuery1;
    

    Third option

    Try this approach, after adapting it to your needs:

    /* For each json line create a json map (SqlMap) */
    
    @subscriptionsQuery1 = 
        SELECT 
               JsonFunctions.JsonTuple(JsonLine) AS JsonMap
        FROM @SubscriptionsExtractor AS t;
    
    /* For each json map get the required property value */
    
    @subscriptionsQuery1 = 
        SELECT DISTINCT
               JsonMap["alias"] AS Subscription
        FROM @subscriptionsQuery1 AS t;
    
    /* Join the value of all rows into a single row containing an array of all values */
    
    @subscriptionsQuery1 = 
        SELECT
            ARRAY_AGG<string>(Subscription) AS SubscriptionArray
        FROM @subscriptionsQuery1 AS t;