Search code examples

Using XPATH, how to select ANY node that contains a certain string

Let's say I have an XML file like this one:

    <title>John is alive</title>
        A man is found alive after having disappeared for 10 years.
        <en> John disappeared 10 years ago. Lorem ipsum dolor sit amet ...</en>
        <fr> Il y a 10 ans, John disparaissait. Lorem ipsum dolor sit amet ...</fr>
    <notes>First book in the series, where the character is introduced</notes>
    <title>The disappearance of John</title>
        A prequel to the book "John is alive".
        <en> He lead an ordinary life, but then ... lorem ipsum dolor sit amet ...</en>
        <fr> Sa vie était tout à fait ordinaire, mais ... lorem ipsum dolor sit amet ...</fr>
    <notes>Second book in the "John" series, but first in chronological order</notes>

My question is simple: how can I, using XPATH, get a collection of all nodes that contain the word John?

Obviously, I can specify a series of nodes and that works fine:

(//title | //abstract | //description/* | //notes)[contains(lower-case(text()),"john")]

But if my XML grows (and it will!), with new elements being added at various levels in the structure, I don't want to constantly have to go back and adjust my XPATH.

What I fail to understand is why a generic statement like


fails with this error message Required cardinality of first argument of lower-case() is one or zero.

Yet, not all statements with an asterisk fail.

For instance:

//books/book/*[contains(lower-case(text()),"john")] fails with the above error message


//books/book/*/*[contains(lower-case(text()),"john")] succeeds and retrieves both the <en> and <fr> nodes from the first <description> element

If it's not possible, fine, I will list all elements in my XPATH, but I still would like to get a clear understanding of the behavior of the * selector in the context of a contains() operation.


  • There's some ambiguity regarding the term nodes (see XPath difference between child::* and child::node()) and the term contains (see How to use XPath contains() for specific text?) when being less than perfectly precise, but one of the following XPaths will likely meet your needs:

    1. All nodes whose string value contains the substring, "John":

    2. All such elements:

    3. All such attributes:

    4. All such text nodes:

    5. All elements with text node children that contain the substring, "John":


    Notice that #1 will include books, but #5 will exclude it. See Testing text() nodes vs string values in XPath.

    You can replace contains(.,"John") with contains(lower-case(.),"john") in any of the above XPaths if you're using XPath 2.0. See also Case insensitive XPath contains() possible?