Search code examples
xmlxpathxpath-1.0

Can I refactor to avoid "self::" and "parent::"?


I want to extract //pre and //code elements but exclude //pre/code. For example:

<root>
    <pre><code>foo</code></pre>
    <code>bar</code>
    <pre>baz</pre>
    <span>ignore me<code>select me</code></span>
</root>

I want to retrieve four elements:

  1. <pre><code>foo</code></pre>
  2. <code>bar</code>
  3. <pre>baz</pre>
  4. <code>select me</code>

(And I specifically don't want <code>foo</code>)

The following xpath seems to do the trick:

//*[(self::pre or self::code) and not (self::code and parent::pre)]

I don't know if that's the right approach, but it seems to work.

Is there a less verbose way to express this (e.g. that doesn't require self:: and parent::)?


Solution

  • Trying to eliminate self:: and parent:: isn't really a laudable goal in general. You may be searching for an abbreviation of those axes in the hope that they'll allow a shortened equivalent form of expression.

    This is understandable given, for example, that the child axis,

    /child::a/child:b
    

    can be more concisely written

    /a/b
    

    What are the parallel abbreviations for self:: and parent::?

    • self::node() can be abbreviated .
    • parent::node() can be abbreviated ..

    However, these are more useful in cases where the name of the context node or its parent are immaterial — not so in your case. (For example, ./ is used for a relative path as opposed to / for an absolute path; ../@attr is used to refer to the attr attribute of the parent element as opposed to @attr for the context element.)

    So, in short, other than logical simplification as suggested by @JLRishe, your XPaths are reasonably simple already. Axes abbreviations aren't going to be of much help.