I am new to U-Sql and writing code for extracting xml element. Here is my XML code.
<Dispositions>
<Disposition>test 1</Disposition>
<Disposition> test 2</Disposition>
</Dispositions>
I was trying to display all "Disposition" values. But final output displaying "Test 2" only. I have code as below.
DECLARE @testFile string ="Marlboro20180301000000_child.xml";
@xmlElementRowPath= EXTRACT Disposition string
FROM @testFile
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlExtractor("Dispositions",
new SQL.MAP<string,string>{
{"Disposition","Disposition"}
}
);
@words =
SELECT Ar.word, COUNT(*) AS count
FROM @xmlElementRowPath
CROSS APPLY
EXPLODE(new SQL.ARRAY<String>( Disposition.Split(','))) AS Ar(word)
GROUP BY Ar.word;
//ORDER BY count DESC;
OUTPUT @words
TO "test.csv"
USING Outputters.Csv();
I have been expecting output as
Test 1
Test 2
But receiving only Test 2
as output
I got this to work using the XmlDomExtractor
. Some sample code:
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
// Get the xml using sample extractor
@input = EXTRACT Disposition string
FROM "/input/yourXML.xml"
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor(rowPath:"Disposition",
columnPaths:new SQL.MAP<string, string> { {".", "Disposition"} } );
// Do some processing here
@output = SELECT * FROM @input;
// Output
OUTPUT @output
TO "/output/output.csv"
USING Outputters.Csv();
My results:
I'm not sure why you need the Split
method - does the xml Disposition
node hold text that needs to be split by some delimiter? If so, please post some more realistic sample data.