Search code examples
xmlxpathxpath-1.0

xPath Expression v1.0 [iterator if any]


Bit struggling to loop through dynamic occurrence of element names and extract the corresponding value. I am trying direct xPath expression.

My xml looks like:

<myXMLNode>
    <sunnyDay>YES</sunnyDay>
    <snowing>NO</snowing>   
    <temperatureInCelsius>
        <Date>2013-06-01</Date>
        <Date>2013-06-30</Date>
        <Date>2013-07-01</Date>
    </temperatureInCelsius>
</myXMLNode>

I want to extract all available Date element values having pipe separated, which keeps varying (at the moment 3 dates in my example above) example output: 2013-06-01|2013-06-30|2013-07-01

I tried below but no luck:

1. concat(//myXMLNode/temperatureInCelsius/Date[1], "_" ,//myXMLNode/temperatureInCelsius/Date[2], "_" ,//myXMLNode/temperatureInCelsius/Date[3])

2. //myXMLNode/temperatureInCelsius/Date[position()>0 or position()<=count(myXMLNode/temperatureInCelsius/Date)

3. //myXMLNode/temperatureInCelsius/Date[position()>0 and position()<=count(myXMLNode/temperatureInCelsius/Date)

Solution

  • The correct XPath expression to retrieve all relevant strings is

    /myXMLNode/temperatureInCelsius/Date
    

    or possibly

    /myXMLNode/temperatureInCelsius/Date/text()
    

    to select text nodes directly.

    Concatenating those results with a separator such as | should be done not in XPath, but in the host language or environment that you are using. For instance, this is straightforward to do in Python:

    >>> from lxml import etree
    >>> document_string = """<myXMLNode>
    ...     <sunnyDay>YES</sunnyDay>
    ...     <snowing>NO</snowing>
    ...     <temperatureInCelsius>
    ...         <Date>2013-06-01</Date>
    ...         <Date>2013-06-30</Date>
    ...         <Date>2013-07-01</Date>
    ...     </temperatureInCelsius>
    ... </myXMLNode>"""
    >>> root = etree.fromstring(document_string)
    >>> dates = root.xpath("/myXMLNode/temperatureInCelsius/Date/text()")
    >>> dates
    ['2013-06-01', '2013-06-30', '2013-07-01']
    >>> "|".join(dates)
    '2013-06-01|2013-06-30|2013-07-01'