Search code examples
sqlsql-serverxmlxquery

How to parse latest entry only (based on index attribute) from an XML in SQL Server 2019 (v15)?


I am working on XML files with the following structure. I would like to select the latest entry (this is where d5p1:index="0") of the entries where d5p1:id="W45A" or d5p1:id="W450", while ignoring any older entries.

My desired output would look something like this. Note that I'd ideally also include the W45025 as null, despite it not being included in the latest entry at all:

Id Term
W45A01 20211221
W45A02 4013072
W45A08 4
W45001 20211130
W45022 4043072
W45023 993344
W45024 0
W45025 NULL
<UcSwedishIndividualReport xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Content>
    <status xmlns:d3p1="http://www.uc.se/schemas/ucOrderReply/" xmlns="http://www.uc.se/schemas/ucOrderReply/" d3p1:result="ok" />
    <ucReport xmlns="http://www.uc.se/schemas/ucOrderReply/">
      <xmlReply>
        <reports xmlns:d5p1="http://www.uc.se/schemas/ucOrderReply/" d5p1:lang="eng">
          <d5p1:report d5p1:id="6002124815" d5p1:name="'FirstName LastName" d5p1:styp="K39" d5p1:index="0">

            <d5p1:group d5p1:id="W45A" d5p1:index="0" d5p1:key="" d5p1:name="Credit commitments">
              <d5p1:term d5p1:id="W45A01">20211221</d5p1:term>
              <d5p1:term d5p1:id="W45A02">4013072</d5p1:term>
              <d5p1:term d5p1:id="W45A08">4</d5p1:term>
            </d5p1:group>
              
            <d5p1:group d5p1:id="W450" d5p1:index="0" d5p1:key="" d5p1:name="Credit commitments">
              <d5p1:term d5p1:id="W45001">20211130</d5p1:term>
              <d5p1:term d5p1:id="W45022">4043072</d5p1:term>
              <d5p1:term d5p1:id="W45023">993344</d5p1:term>
              <d5p1:term d5p1:id="W45024">0</d5p1:term>
            </d5p1:group>
              
            <d5p1:group d5p1:id="W450" d5p1:index="1" d5p1:key="" d5p1:name="Credit commitments">
              <d5p1:term d5p1:id="W45001">20210930</d5p1:term>
              <d5p1:term d5p1:id="W45022">4240629</d5p1:term>
              <d5p1:term d5p1:id="W45023">1185833</d5p1:term>
              <d5p1:term d5p1:id="W45024">0</d5p1:term>
            </d5p1:group>
              
            <d5p1:group d5p1:id="W450" d5p1:index="2" d5p1:key="" d5p1:name="Credit commitments">
              <d5p1:term d5p1:id="W45001">20210731</d5p1:term>
              <d5p1:term d5p1:id="W45022">4254142</d5p1:term>
              <d5p1:term d5p1:id="W45023">1194268</d5p1:term>
              <d5p1:term d5p1:id="W45024">0</d5p1:term>
              <d5p1:term d5p1:id="W45025">3029874</d5p1:term>
            </d5p1:group>
              
            <d5p1:group d5p1:id="W495" d5p1:index="0" d5p1:key="" d5p1:name="Income info">
               <d5p1:term d5p1:id="W49501">2021</d5p1:term>
               <d5p1:term d5p1:id="W49517">411000</d5p1:term>
               <d5p1:term d5p1:id="W49518">53400</d5p1:term>
               <d5p1:term d5p1:id="W49522">410900</d5p1:term>
               <d5p1:term d5p1:id="W49541">75200</d5p1:term>
               <d5p1:term d5p1:id="W49591">75200</d5p1:term>
               <d5p1:term d5p1:id="W49592">464300</d5p1:term>
            </d5p1:group>
              
          </d5p1:report>
        </reports>
      </xmlReply>
    </ucReport>
  </Content>
</UcSwedishIndividualReport>

My current code only grabs all of the values, regardless of their index or of their id. The problem is that I can't differ the old entries from the most recent ones with this method. I've previously tried to differ the values later on by ranking the rows in the order that they were read and then selecting the variables with the lowest row rank, but it seems like the code doesn't always load the table starting from id 0, which breaks my logic.

drop table #UC
;WITH XMLNAMESPACES('http://www.uc.se/schemas/ucOrderReply/' AS ns,'http://www.uc.se/schemas/ucOrderReply/' AS d5p1)
SELECT ok.*
    ,X.g.value('(@d5p1:id)','varchar(20)') AS id
    ,X.g.value('(text())[1]','varchar(20)') AS term
into #UC
FROM #1 as ok
CROSS APPLY ok.[Message].nodes('UcSwedishIndividualReport/Content/ns:ucReport/ns:xmlReply/ns:reports/ns:report/ns:group/ns:term') X(g)

Solution

  • I suppose you can use XPath to filter the specific <d5p1:group> nodes and their children. Simplified XPath would be:

    //d5p1:group[@d5p1:id="W45A" or @d5p1:id="W450"][@d5p1:index="0"]/d5p1:term
    

    And simplified query:

    SELECT term.value('@d5p1:id', 'varchar(100)') AS id, term.value('.', 'varchar(100)') AS term
    FROM ... AS t(doc)
    CROSS APPLY doc.nodes('//d5p1:group[@d5p1:id="W45A" or @d5p1:id="W450"][@d5p1:index="0"]/d5p1:term') x(term)
    

    DB<>Fiddle