I am working on XML files with the following structure. I would like to select the latest entry (this is where d5p1:index="0") of the entries where d5p1:id="W45A" or d5p1:id="W450", while ignoring any older entries.
My desired output would look something like this. Note that I'd ideally also include the W45025 as null, despite it not being included in the latest entry at all:
Id | Term |
---|---|
W45A01 | 20211221 |
W45A02 | 4013072 |
W45A08 | 4 |
W45001 | 20211130 |
W45022 | 4043072 |
W45023 | 993344 |
W45024 | 0 |
W45025 | NULL |
<UcSwedishIndividualReport xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Content>
<status xmlns:d3p1="http://www.uc.se/schemas/ucOrderReply/" xmlns="http://www.uc.se/schemas/ucOrderReply/" d3p1:result="ok" />
<ucReport xmlns="http://www.uc.se/schemas/ucOrderReply/">
<xmlReply>
<reports xmlns:d5p1="http://www.uc.se/schemas/ucOrderReply/" d5p1:lang="eng">
<d5p1:report d5p1:id="6002124815" d5p1:name="'FirstName LastName" d5p1:styp="K39" d5p1:index="0">
<d5p1:group d5p1:id="W45A" d5p1:index="0" d5p1:key="" d5p1:name="Credit commitments">
<d5p1:term d5p1:id="W45A01">20211221</d5p1:term>
<d5p1:term d5p1:id="W45A02">4013072</d5p1:term>
<d5p1:term d5p1:id="W45A08">4</d5p1:term>
</d5p1:group>
<d5p1:group d5p1:id="W450" d5p1:index="0" d5p1:key="" d5p1:name="Credit commitments">
<d5p1:term d5p1:id="W45001">20211130</d5p1:term>
<d5p1:term d5p1:id="W45022">4043072</d5p1:term>
<d5p1:term d5p1:id="W45023">993344</d5p1:term>
<d5p1:term d5p1:id="W45024">0</d5p1:term>
</d5p1:group>
<d5p1:group d5p1:id="W450" d5p1:index="1" d5p1:key="" d5p1:name="Credit commitments">
<d5p1:term d5p1:id="W45001">20210930</d5p1:term>
<d5p1:term d5p1:id="W45022">4240629</d5p1:term>
<d5p1:term d5p1:id="W45023">1185833</d5p1:term>
<d5p1:term d5p1:id="W45024">0</d5p1:term>
</d5p1:group>
<d5p1:group d5p1:id="W450" d5p1:index="2" d5p1:key="" d5p1:name="Credit commitments">
<d5p1:term d5p1:id="W45001">20210731</d5p1:term>
<d5p1:term d5p1:id="W45022">4254142</d5p1:term>
<d5p1:term d5p1:id="W45023">1194268</d5p1:term>
<d5p1:term d5p1:id="W45024">0</d5p1:term>
<d5p1:term d5p1:id="W45025">3029874</d5p1:term>
</d5p1:group>
<d5p1:group d5p1:id="W495" d5p1:index="0" d5p1:key="" d5p1:name="Income info">
<d5p1:term d5p1:id="W49501">2021</d5p1:term>
<d5p1:term d5p1:id="W49517">411000</d5p1:term>
<d5p1:term d5p1:id="W49518">53400</d5p1:term>
<d5p1:term d5p1:id="W49522">410900</d5p1:term>
<d5p1:term d5p1:id="W49541">75200</d5p1:term>
<d5p1:term d5p1:id="W49591">75200</d5p1:term>
<d5p1:term d5p1:id="W49592">464300</d5p1:term>
</d5p1:group>
</d5p1:report>
</reports>
</xmlReply>
</ucReport>
</Content>
</UcSwedishIndividualReport>
My current code only grabs all of the values, regardless of their index or of their id. The problem is that I can't differ the old entries from the most recent ones with this method. I've previously tried to differ the values later on by ranking the rows in the order that they were read and then selecting the variables with the lowest row rank, but it seems like the code doesn't always load the table starting from id 0, which breaks my logic.
drop table #UC
;WITH XMLNAMESPACES('http://www.uc.se/schemas/ucOrderReply/' AS ns,'http://www.uc.se/schemas/ucOrderReply/' AS d5p1)
SELECT ok.*
,X.g.value('(@d5p1:id)','varchar(20)') AS id
,X.g.value('(text())[1]','varchar(20)') AS term
into #UC
FROM #1 as ok
CROSS APPLY ok.[Message].nodes('UcSwedishIndividualReport/Content/ns:ucReport/ns:xmlReply/ns:reports/ns:report/ns:group/ns:term') X(g)
I suppose you can use XPath to filter the specific <d5p1:group>
nodes and their children. Simplified XPath would be:
//d5p1:group[@d5p1:id="W45A" or @d5p1:id="W450"][@d5p1:index="0"]/d5p1:term
And simplified query:
SELECT term.value('@d5p1:id', 'varchar(100)') AS id, term.value('.', 'varchar(100)') AS term
FROM ... AS t(doc)
CROSS APPLY doc.nodes('//d5p1:group[@d5p1:id="W45A" or @d5p1:id="W450"][@d5p1:index="0"]/d5p1:term') x(term)