Search code examples
xmlextractazure-data-lakeu-sql

Print all element in XML extractor in U-Sql


I am new to U-Sql and writing code for extracting xml element. Here is my XML code.

<Dispositions>
  <Disposition>test 1</Disposition>
  <Disposition> test 2</Disposition> 
</Dispositions>

I was trying to display all "Disposition" values. But final output displaying "Test 2" only. I have code as below.

DECLARE @testFile string ="Marlboro20180301000000_child.xml";     
@xmlElementRowPath= EXTRACT Disposition string
    FROM @testFile
         USING new Microsoft.Analytics.Samples.Formats.Xml.XmlExtractor("Dispositions",
         new SQL.MAP<string,string>{
{"Disposition","Disposition"} 
} 
);

@words =
        SELECT Ar.word, COUNT(*) AS count
    FROM @xmlElementRowPath
        CROSS APPLY
            EXPLODE(new SQL.ARRAY<String>( Disposition.Split(','))) AS Ar(word)
    GROUP BY Ar.word;
    //ORDER BY count DESC;
OUTPUT @words
TO "test.csv"
USING Outputters.Csv();

I have been expecting output as

Test 1
Test 2

But receiving only Test 2 as output


Solution

  • I got this to work using the XmlDomExtractor. Some sample code:

    REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
    
    // Get the xml using sample extractor
    @input = EXTRACT Disposition string
         FROM "/input/yourXML.xml"
         USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor(rowPath:"Disposition",
                                 columnPaths:new SQL.MAP<string, string> { {".", "Disposition"} } );
    
    
    // Do some processing here
    @output = SELECT * FROM @input;
    
    
    // Output
    OUTPUT @output
    TO "/output/output.csv"
    USING Outputters.Csv();
    

    My results:

    My results

    I'm not sure why you need the Split method - does the xml Disposition node hold text that needs to be split by some delimiter? If so, please post some more realistic sample data.