Search code examples
azure-data-lakeu-sql

U-SQL with XmlExtractor - elements inside elements


In U-SQL I am trying to get a list of elements inside elements, using the XmlExtractor. But I cannot get the nested collection.

It is a list of items, which has locations. With the XmlExtractor I can get a collection of elements, but I don't see how I can get a collection that contains a collection. An XML sample is shown below.

Any ideas?

<root>
<Item>
    <Header>
        <id>111</id>
    </Header>
    <Body>
        <Locations>
            <Location>
                <Station>k4</Station>
                <Timestamp>2017-08-30T02:04:18.2506945+02:00</Timestamp>
            </Location>
            <Location>
                <Station>k5</Station>
                <Timestamp>2017-08-30T02:04:18.2506945+02:00</Timestamp>
            </Location>
        </Locations>
    </Body>
</Item>
<Item>
    <Header>
        <id>222</id>
    </Header>
    <Body>
        <Locations>
            <Location>
                <Station>k4</Station>
                <Timestamp>2017-08-30T02:12:36.1218601+02:00</Timestamp>
            </Location>
            <Location>
                <Station>k5</Station>
                <Timestamp>2017-08-30T02:12:36.1218601+02:00</Timestamp>
            </Location>
        </Locations>
    </Body>
</Item>
</root>

Solution

  • Solved by making an extractor that takes the XML in one string, and then calls a method using xpath, returning an SQL.Array, where the string has comma separated values of of the result. The result looks like this:

    111;k4,2017-08-30T02:04:18.2506945+02:00
    111;k5,2017-08-30T02:04:18.2506945+02:00
    222;k4,2017-08-30T02:12:36.1218601+02:00
    222;k5,2017-08-30T02:12:36.1218601+02:00
    

    The standard XmlExtractor cannot do this, and I also decided that it is better to postpone the parsing of the xml to after it has been extracted, because there can be multiple steps on the same xml.