Search code examples
jsonazure-data-factoryazure-data-lakeu-sql

Parse JSON into U-SQL then convert to csv


I'm trying to convert some telemetry data that is in JSON format into CSV format, then write it out to a file, using U-SQL.

The problem is that some of the JSON key values have periods in them, and so when I'm doing the SELECT operation, U-SQL is not recognizing them. When I check the output file, all that I am seeing is the values for "p1". How can I represent the names of the JSON key names in the script so that they are recognized. Thanks in advance for any help!

Code:

REFERENCE ASSEMBLY MATSDevDB.[Newtonsoft.Json];
REFERENCE ASSEMBLY MATSDevDB.[Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;


@jsonDocuments = 
    EXTRACT jsonString string 
    FROM @"adl://xxxx.azuredatalakestore.net/xxxx/{*}/{*}/{*}/telemetry_{*}.json" 
    USING Extractors.Tsv(quoting:false);

@jsonify = 
    SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS json 
    FROM @jsonDocuments;

@columnized = SELECT 
            json["EventInfo.Source"] AS EventInfoSource,
            json["EventInfo.InitId"] AS EventInfoInitId,
            json["EventInfo.Sequence"] AS EventInfoSequence,
            json["EventInfo.Name"] AS EventInfoName,
            json["EventInfo.Time"] AS EventInfoTime,
            json["EventInfo.SdkVersion"] AS EventInfoSdkVersion,
            json["AppInfo.Language"] AS AppInfoLanguage,
            json["UserInfo.Language"] AS UserInfoLanguage,
            json["DeviceInfo.BrowserName"] AS DeviceInfoBrowswerName,
            json["DeviceInfo.BrowserVersion"] AS BrowswerVersion,
            json["DeviceInfo.OsName"] AS DeviceInfoOsName,
            json["DeviceInfo.OsVersion"] AS DeviceInfoOsVersion,
            json["DeviceInfo.Id"] AS DeviceInfoId,
            json["p1"] AS p1,
            json["PipelineInfo.AccountId"] AS PipelineInfoAccountId, 
            json["PipelineInfo.IngestionTime"] AS PipelineInfoIngestionTime, 
            json["PipelineInfo.ClientIp"] AS PipelineInfoClientIp,
            json["PipelineInfo.ClientCountry"] AS PipelineInfoClientCountry,
            json["PipelineInfo.IngestionPath"] AS PipelineInfoIngestionPath,
            json["AppInfo.Id"] AS AppInfoId,
            json["EventInfo.Id"] AS EventInfoId,
            json["EventInfo.BaseType"] AS EventInfoBaseType,
            json["EventINfo.IngestionTime"] AS EventINfoIngestionTime
    FROM @jsonify;

OUTPUT @columnized
TO "adl://xxxx.azuredatalakestore.net/poc/TestResult.csv"
USING Outputters.Csv(quoting : false);

JSON:

{"EventInfo.Source":"JS_default_source","EventInfo.Sequence":"1","EventInfo.Name":"daysofweek","EventInfo.Time":"2018-01-25T21:09:36.779Z","EventInfo.SdkVersion":"ACT-Web-JS-2.6.0","AppInfo.Language":"en","UserInfo.Language":"en-US","UserInfo.TimeZone":"-08:00","DeviceInfo.BrowserName":"Chrome","DeviceInfo.BrowserVersion":"63.0.3239.132","DeviceInfo.OsName":"Mac OS X","DeviceInfo.OsVersion":"10","p1":"V1","PipelineInfo.IngestionTime":"2018-01-25T21:09:33.9930000Z","PipelineInfo.ClientCountry":"CA","PipelineInfo.IngestionPath":"FastPath","EventInfo.BaseType":"custom","EventInfo.IngestionTime":"2018-01-25T21:09:33.9930000Z"}


Solution

  • I got this to work with single quotes and single square brackets, eg

    @columnized = SELECT 
                json["['EventInfo.Source']"] AS EventInfoSource,
    ...
    

    Full code:

    @columnized = SELECT 
                json["['EventInfo.Source']"] AS EventInfoSource,
                json["['EventInfo.InitId']"] AS EventInfoInitId,
                json["['EventInfo.Sequence']"] AS EventInfoSequence,
                json["['EventInfo.Name']"] AS EventInfoName,
                json["['EventInfo.Time']"] AS EventInfoTime,
                json["['EventInfo.SdkVersion']"] AS EventInfoSdkVersion,
                json["['AppInfo.Language']"] AS AppInfoLanguage,
                json["['UserInfo.Language']"] AS UserInfoLanguage,
                json["['DeviceInfo.BrowserName']"] AS DeviceInfoBrowswerName,
                json["['DeviceInfo.BrowserVersion']"] AS BrowswerVersion,
                json["['DeviceInfo.OsName']"] AS DeviceInfoOsName,
                json["['DeviceInfo.OsVersion']"] AS DeviceInfoOsVersion,
                json["['DeviceInfo.Id']"] AS DeviceInfoId,
                json["p1"] AS p1,
                json["['PipelineInfo.AccountId']"] AS PipelineInfoAccountId, 
                json["['PipelineInfo.IngestionTime']"] AS PipelineInfoIngestionTime, 
                json["['PipelineInfo.ClientIp']"] AS PipelineInfoClientIp,
                json["['PipelineInfo.ClientCountry']"] AS PipelineInfoClientCountry,
                json["['PipelineInfo.IngestionPath']"] AS PipelineInfoIngestionPath,
                json["['AppInfo.Id']"] AS AppInfoId,
                json["['EventInfo.Id']"] AS EventInfoId,
                json["['EventInfo.BaseType']"] AS EventInfoBaseType,
                json["['EventINfo.IngestionTime']"] AS EventINfoIngestionTime
        FROM @jsonify;
    

    My results:

    Results