Search code examples
c#xmlxpath

XPath to select a node based on the following node but only if that node contains a certain element


given this XML document stub, (that I can't modified the generation of)...

<Datastream parserApplicationName="mscorlib" parserApplicationVersion="4.0.0.0" parserAssemblyName="Prophet21.Datastream" parserAssemblyVersion="23.1.1.0">
  <JOBXXXXDEF type="1" typeName="JobHeader" key="{a52722bf-c784-4d0a-b80c-d60a55179cb5}" InputFileName="">
   
    <COPIES>1</COPIES>
  
      <HDRXXXXDEF >
        <TITLE>QUOTATION</TITLE>
        <ORDER_ACK_NUMBER>1000998</ORDER_ACK_NUMBER>
       
      </HDRXXXXDEF>

      <LINEXXXDEF lineno ="1" >
        <ORDERED_QTY>1.00</ORDERED_QTY>
        <ORDER_UOM>EA</ORDER_UOM>
      </LINEXXXDEF>

      <LINEXXXDEF lineno ="2">
        <EXDSCXXDEF >
          <EXTENDED_DESCRIPTION>Delivery: 17-20 Weeks</EXTENDED_DESCRIPTION>
        </EXDSCXXDEF>
      </LINEXXXDEF>

      <LINEXXXDEF lineno ="3">
        <ASCMPXXDEF >
          <QTY_PER_ASSEMBLY>1.00</QTY_PER_ASSEMBLY>
        </ASCMPXXDEF>
      </LINEXXXDEF>

      <LINEXXXDEF lineno ="4">
        <EXDSCXXDEF >
          <EXTENDED_DESCRIPTION>Stainless Steel Design</EXTENDED_DESCRIPTION>
        </EXDSCXXDEF>
      </LINEXXXDEF>

      <LINEXXXDEF lineno ="5">
        <ASCMPXXDEF >
          <QTY_PER_ASSEMBLY>1.00</QTY_PER_ASSEMBLY>
        </ASCMPXXDEF>
      </LINEXXXDEF>
    
      <LINEXXXDEF lineno ="6">
        <ORDERED_QTY>1.00</ORDERED_QTY>
        <ORDER_UOM>EA</ORDER_UOM>
      
      </LINEXXXDEF>
      <LINEXXXDEF lineno ="7">
        <EXDSCXXDEF >
          <EXTENDED_DESCRIPTION>Added to the above if required.</EXTENDED_DESCRIPTION>
        </EXDSCXXDEF>
      </LINEXXXDEF>

      <TOTALSXDEF >
        <SUBTOTXDEF >
          <TOTAL_LINES>2</TOTAL_LINES>
          <SUB_TOTAL>1,880.00</SUB_TOTAL>
          <TAXES>0.00</TAXES>
          <TOTAL_ECO_FEE>0.00</TOTAL_ECO_FEE>
          <RETAIL_DELIVERY_FEE>0.00</RETAIL_DELIVERY_FEE>
          <JURISDICTION_DESC />
        </SUBTOTXDEF>
        <GRDTOTXDEF ">
          <GRAND_TOTAL>1,880.00</GRAND_TOTAL>
          <CURRENCY_DESC>U.S. Dollars</CURRENCY_DESC>
        </GRDTOTXDEF>
      </TOTALSXDEF>
    </FORMXXXDEF>
  </JOBXXXXDEF>
</Datastream>

I need XPath query or queries that will select the entire nodes where the LINEXXDEF has a child named ASCMPXXDEF and then if the NEXT LINEXXDEF after that parent node has a child named EXDSCXXDEF, I need that node. So in the above, I need these nodes, for lines 3, 4 and 5 above

 <LINEXXXDEF lineno ="3">
        <ASCMPXXDEF >
          <QTY_PER_ASSEMBLY>1.00</QTY_PER_ASSEMBLY>
        </ASCMPXXDEF>
      </LINEXXXDEF>

      <LINEXXXDEF lineno ="4">
        <EXDSCXXDEF >
          <EXTENDED_DESCRIPTION>Stainless Steel Design</EXTENDED_DESCRIPTION>
        </EXDSCXXDEF>
      </LINEXXXDEF>

      <LINEXXXDEF lineno ="5">
        <ASCMPXXDEF >
          <QTY_PER_ASSEMBLY>1.00</QTY_PER_ASSEMBLY>
        </ASCMPXXDEF>
      </LINEXXXDEF>

I need line 3 because it is a Line that has ASCMPXXDEF as a child, I need line 4 because it has a child named EXDSCXXDEF, and I need line 5 because it again has an ASCMPXXDEF child.

The closest I've come is this XPATH

//ASCMPXXDEF/parent::*/following-sibling::LINEXXXDEF[EXDSCXXDEF][1] | //ASCMPXXDEF/parent::*

but this is returning

 <LINEXXXDEF lineno ="3">
        <ASCMPXXDEF >
          <QTY_PER_ASSEMBLY>1.00</QTY_PER_ASSEMBLY>
        </ASCMPXXDEF>
      </LINEXXXDEF>

      <LINEXXXDEF lineno ="4">
        <EXDSCXXDEF >
          <EXTENDED_DESCRIPTION>Stainless Steel Design</EXTENDED_DESCRIPTION>
        </EXDSCXXDEF>
      </LINEXXXDEF>

      <LINEXXXDEF lineno ="5">
        <ASCMPXXDEF >
          <QTY_PER_ASSEMBLY>1.00</QTY_PER_ASSEMBLY>
        </ASCMPXXDEF>
      </LINEXXXDEF>

<LINEXXXDEF lineno ="7">
        <EXDSCXXDEF >
          <EXTENDED_DESCRIPTION>Added to the above if required.</EXTENDED_DESCRIPTION>
        </EXDSCXXDEF>
      </LINEXXXDEF>

but it is including Line 7, which I do NOT want because line 7 does NOT directly follow a LINEXXXDEF node that contains ASCMPXXDEF.

Is there a way to have XPATH evaluate ONLY the NEXT sibling? essentially I'm getting line 7 because Line 5 is an ASCMPXXDEF node, but I would only need this if it were directly after this, in line 6. Line 6 is a restart of my evaluation process... but the XPATH skips it, but because line 7 is after line 5, it's being included.

I'm also working in C# if there's an easier way to manipulate there...

I'm trying to find out how to remove any nodes that include EXDSCXXDEF but are immediately after one containing ASCMPXXDEF

<LINEXXXDEF lineno ="3">
        <ASCMPXXDEF >
          <QTY_PER_ASSEMBLY>1.00</QTY_PER_ASSEMBLY>
        </ASCMPXXDEF>
      </LINEXXXDEF>

      <LINEXXXDEF lineno ="4">
        <EXDSCXXDEF >
          <EXTENDED_DESCRIPTION>Stainless Steel Design</EXTENDED_DESCRIPTION>
        </EXDSCXXDEF>
      </LINEXXXDEF>

      <LINEXXXDEF lineno ="5">
        <ASCMPXXDEF >
          <QTY_PER_ASSEMBLY>1.00</QTY_PER_ASSEMBLY>
        </ASCMPXXDEF>
      </LINEXXXDEF>

*this node below should not be included
<LINEXXXDEF lineno ="7">
        <EXDSCXXDEF >
          <EXTENDED_DESCRIPTION>Added to the above if required.</EXTENDED_DESCRIPTION>
        </EXDSCXXDEF>
      </LINEXXXDEF>*

Solution

  • The following XPath will get what you want.

    /Datastream/JOBXXXXDEF/LINEXXXDEF[
      ASCMPXXDEF 
      or (EXDSCXXDEF and preceding-sibling::LINEXXXDEF[1][ASCMPXXDEF ])
    ]
    

    dotnetfiddle

    The logic is as follows:

    • Either this node itself has a ASCMPXXDEF child
    • Or it has a EXDSCXXDEF and ...
      • ... it also has the first [1] preceding LINEXXXDEF which itself has a ASCMPXXDEF child.

    So we are using preceding-sibling not following-sibling because we need to know what precedes the EXDSCXXDEF node, not what follows the ASCMPXXDEF one.