Search code examples
xmlxpathposition

XPath syntax - how to use position() in a complex location path


I'm writing a recursive algorithm to generate a unique minimal XPath for a given element. The main idea is to allow picking an element in one document (i.e. an html element in chrome on PC) and be able to find the corresponding element in similar documents (i.e. same site in it's mobile version).

During the process, I need to generate a full XPath string for an entire given document, i.e. for a given node, traverse the entire tree and append all nodes with all their attributes to the string.

For instance, for the following document (the wanted element is marked with "*"):

<?xml version="1.0" encoding="UTF-16"?>
<node>
    <node/>
    <node id="content">
        <node>
            <node>
                <node id="url_text_field"/>
                *<node id="go_button" text="Go">
                </node>*
                <node id="back_button" text="Back">
                </node>
            </node>
            <node id="webViewPlaceholder">
                <node/>
            </node>
        </node>
    </node>
</node>

The XPath generated by my code:

//*[@id='go_button' and @text='Go' and parent::*[child::*[@id='url_text_field'] and child::*[@id='back_button' and @text='Back'] and parent::*[child::*[@id='webViewPlaceholder'] and parent::*[@id='content']]]]

yield <node id="go_button" text="Go"> Which perfectly fits the element.

My problem is that in one particular case (that is - when a sub-tree that contains the wanted element has an identical "brother(s)") I have to use the element's position()=SOME_NUMBER (or an "index" node [SOME_NUMBER]) to uniquely identify the element, and I'm having trouble with the syntax.

For example, for the more complex document (again, element is marked with "*". index attribute is not part of the original document, and was added just for reference):

<?xml version="1.0" encoding="UTF-16"?>
<node>
    <node/>
    <node id="content" index="a">
        <node>
            <node>
                <node id="url_text_field"/>
                <node id="go_button" text="Go" index="a1">
                </node>
                *<node id="go_button" text="Go" index="a2">
                </node>*
                <node id="back_button" text="Back">
                </node>
            </node>
            <node id="webViewPlaceholder">
                <node/>
            </node>
        </node>
    </node>
    <node id="content" index="b">
        <node>
            <node>
                <node id="url_text_field"/>
                <node id="go_button" text="Go" index="b1">
                </node>
                <node id="go_button" text="Go" index="b2">
                </node>
                <node id="back_button" text="Back">
                </node>
            </node>
            <node id="webViewPlaceholder">
                <node/>
            </node>
        </node>
    </node>
</node>

Of course, the previous XPath finds four elements:

<node id="go_button" text="Go" index="a1"></node>
<node id="go_button" text="Go" index="a2"></node>
<node id="go_button" text="Go" index="b1"></node>
<node id="go_button" text="Go" index="b2"></node>

I tried to add the position node in various places in the XPath (for instance //*[@id='go_button' and @text='Go' and position=2 and parent::*[child::*[@id='url_text_field'] and child::*[@id='back_button' and @text='Back'] and parent::*[child::*[@id='webViewPlaceholder'] and parent::*[@id='content'][1]]]] doesn't work), but couldn't find a way to match only the second "brother" sub-tree under the first "parent" sub-tree.


Solution

  • The solution is to use an index node instead of calling position.
    I wrap the entire xpath expression with round braces, and add the index:

    (xpath_expression)[index]