XPATH: find all elements with same value until the value changes

Here's a sample XML:

<?xml version="1.0" ?>
<someparent>
    <somechild>
        <description>I want this</description>
        <id>98</id>
    </somechild>
    <somechild>
        <description>I don't want this</description>
        <id>98</id>
    </somechild>
    <somechild>
        <description>I want this too</description>
        <id>2</id>
    </somechild>
    <somechild>
        <description>Nope, not that one</description>
        <id>2</id>
    </somechild>
    <somechild>
        <description>Not that one either</description>
        <id>2</id>
    </somechild>
    <somechild>
        <description>Yep, I want this</description>
        <id>41</id>
    </somechild>
</someparent>

The <id> elements are always grouped: all elements with the same <id> value follow each other in the document. I may have thousands of different <id>s in a single file. What I want is to find each <somechild> element that is the first occurrence of its corresponding <id> group. So my expected result would be:

    <somechild>
        <description>I want this</description>
        <id>98</id>
    </somechild>
    <somechild>
        <description>I want this too</description>
        <id>2</id>
    </somechild>
    <somechild>
        <description>Yep, I want this</description>
        <id>41</id>
    </somechild>

I need a single XPATH command to select all of these "first items in a group". I have tried various combinations of following-sibling and preceding-sibling axes, but I can't get it just quite right. I have come very close to what I want to achieve with the following statement:

//someparent/somechild/id[text()=parent::somechild/preceding-sibling::somechild/id[text()]]/parent::somechild

This actually returns all the nodes I don't want, as it selects all the items that are not the first in their group (so it's essentially a perfect negative of what I want!). But for the life of me, I haven't been able to figure out how to reverse the results.

Any help woud be kindly appreciated.

Solution

This O(n²) XPath 1.0 expression,

//someparent/somechild[not(id = preceding-sibling::somechild/id)]

will select all somechild elements that have no preceding siblings with the same id child element,

   <somechild>
        <description>I want this</description>
        <id>98</id>
    </somechild>
    <somechild>
        <description>I want this too</description>
        <id>2</id>
    </somechild>
    <somechild>
        <description>Yep, I want this</description>
        <id>41</id>
    </somechild>

as requested.

Update

Michael Kay noted helpfully that the above XPath has an algorithmic complexity of O(n²), because for each child sibling, all preceding siblings are compared. This won't matter for small numbers of siblings, but OP mentioned thousands, so the size issue becomes a concern.

See his XPath 3.1 solution, which is a much better O(n).

He further observed that an O(n) XPath 1.0 expression is possible as long as only immediately preceding siblings have to be checked:

//someparent/somechild[not(id = preceding-sibling::somechild[1]/id)]
                                                            ^^^

This lower complexity XPath will yield the same results for OP's sample case.

A differentiating case would involve later siblings with id values that repeat earlier clusters of id values. For example, adding another cluster of id siblings with 98 values:

<someparent>
  <somechild>
    <description>I want this</description>
    <id>98</id>
  </somechild>
  <somechild>
    <description>I don't want this</description>
    <id>98</id>
  </somechild>
  <somechild>
    <description>I want this too</description>
    <id>2</id>
  </somechild>
  <somechild>
    <description>Nope, not that one</description>
    <id>2</id>
  </somechild>
  <somechild>
    <description>Not that one either</description>
    <id>2</id>
  </somechild>
  <somechild>
    <description>Yep, I want this</description>
    <id>41</id>
  </somechild>
  <somechild>
    <description>REPEAT CASE 1</description>
    <id>98</id>
  </somechild>  
  <somechild>
    <description>REPEAT CASE 2</description>
    <id>98</id>
  </somechild>
</someparent>

The difference is that the O(n) XPath will not include the REPEAT CASE 1 somechild element, but the O(n²) XPath will include the distantly repeated REPEAT CASE 1:

<somechild>
    <description>I want this</description>
    <id>98</id>
</somechild>
<somechild>
    <description>I want this too</description>
    <id>2</id>
</somechild>
<somechild>
    <description>Yep, I want this</description>
    <id>41</id>
</somechild>
<somechild>
  <description>REPEAT CASE 1</description>
  <id>98</id>
</somechild>

As long as the requirements do not require non-immediate comparisons, use the more efficient O(n) XPath.