Search code examples
xmlt-sqlxpathxquerysql-server-2017-express

How to get distinct elements with all values from all arrays XML in T-SQL


I need some help trying to import the following XML document

<ROOT>
<MeasuresList>
  <Measure ID="164">       
    <Names>
    <Name ID="1072687" langid="33"/>
    <Name ID="1017672" langid="13">miljoner skär</Name>
   </Names>
  </Measure>
  <Measure ID="362">       
   <Names>
    <Name ID="1072687" langid="33"/>
    <Name ID="1017672" langid="13">miljoner skär</Name>
    <Name ID="4068857" langid="19">hyller</Name>
    <Name ID="3330057" langid="18">ράφια</Name>
    <Name ID="3291105" langid="20">raflar</Name>
    <Name ID="2813622" langid="10"/>         
   </Names>       
  </Measure>
  <Measure ID="162">       
   <Names>
    <Name ID="1072687" langid="33"/>
    <Name ID="1017672" langid="13">miljoner skär</Name>
    <Name ID="1072427" langid="36"/>
    <Name ID="438237" langid="20">sayfa başına geçen dakika</Name>         
   </Names>       
  </Measure>
</MeasuresList>

Into a dbo.table (ID int PRIMARY KEY, langid int FK, Value NVARCHAR(MAX)) I am using

ALTER PROCEDURE [dbo].[AddNewMeasuresList] 
(
@XmlData XML
)

AS

DECLARE @doc int;
EXEC sp_xml_preparedocument @doc OUTPUT, @XmlData

BEGIN TRANSACTION T1
Insert Measure_Name
SELECT *
FROM OPENXML(@doc,'//ROOT/MeasuresList/Measure/Names/Name') WITH Measure_Name AS mes
WHERE NOT EXISTS
(   
    SELECT 1 FROM dbo.Measure_Name WHERE ID=mes.ID
)
COMMIT TRANSACTION T1
EXEC sp_xml_removedocument @doc

However i'm getting an exception in my c# application

System.Data.SqlClient.SqlException (0x80131904): 
Violation of PRIMARY KEY constraint 'PK_Measure_Name'. 
Cannot insert duplicate key in object 'dbo.Measure_Name'. The duplicate key value is (1072687)

The dbo.Measure_Name Table is empty but there are duplicate entries in other Measure XML nodes with the same Name ID.

Since my SELECT statement returns all "Name" which results in duplicate PRIMARY KEYS, how do i change the select statement to return all values but with only DISTINCT/UNIQUE ID's?


Solution

  • Your own solution might work, but this can be done better...

    FROM OPENXML together with the SPs to prepare and to remove the document is outdated and should not be used any more. Rather use the native XML support.

    Try this and adapt it to your needs:

    Your XML

    DECLARE @xml XML=
    N'<ROOT>
      <MeasuresList>
        <Measure ID="164">
          <Names>
            <Name ID="1072687" langid="33" />
            <Name ID="1017672" langid="13">miljoner skär</Name>
          </Names>
        </Measure>
        <Measure ID="362">
          <Names>
            <Name ID="1072687" langid="33" />
            <Name ID="1017672" langid="13">miljoner skär</Name>
            <Name ID="4068857" langid="19">hyller</Name>
            <Name ID="3330057" langid="18">ράφια</Name>
            <Name ID="3291105" langid="20">raflar</Name>
            <Name ID="2813622" langid="10" />
          </Names>
        </Measure>
        <Measure ID="162">
          <Names>
            <Name ID="1072687" langid="33" />
            <Name ID="1017672" langid="13">miljoner skär</Name>
            <Name ID="1072427" langid="36" />
            <Name ID="438237" langid="20">sayfa başına geçen dakika</Name>
          </Names>
        </Measure>
      </MeasuresList>
    </ROOT>';
    

    --My target table will include the @ID found in <Measure>, don't know if you need this...

    DECLARE @target TABLE(MeasureID int, NameID int, [langid] int, [Value] NVARCHAR(MAX))
    

    --The data will use the table @target as staging table
    --The query uses .nodes(), APPLY and .value to read the values out of your XML

    INSERT INTO @target(MeasureID,NameID,[langid],[Value]) 
    SELECT m.value(N'@ID',N'int') AS MeasureID
          ,n.value(N'@ID',N'int') AS NameID
          ,n.value(N'@langid',N'int') AS [langid]
          ,n.value(N'text()[1]',N'nvarchar(max)') AS [Value]
    FROM @xml.nodes(N'/ROOT/MeasuresList/Measure') AS A(m)
    OUTER APPLY A.m.nodes(N'Names/Name') AS B(n);
    

    --this returns the full (denormalized) content

    SELECT * FROM @target;
    

    After this you can use any kind of GROUP BY, aggregation, whatever you might need, to shift this into your target tables.

    I'd suggest a structure with at least two related tables reflecting your 1:n related Measure/Name relation.

    Some explanation

    Using FROM @xml.nodes(N'/ROOT/MeasuresList/Measure') AS A(m) will result in a derived table of all (repeating) <Measure> elements. This "table" is called A, while the "column" is called m (for "measure"). This column is an XML fragment representing one single <Measure> per row.

    With OUTER APPLY A.m.nodes(N'Names/Name') AS B(n) I take this derived table A, use its column m and again .nodes() to dive down to the (repeated) <Name> elements.

    The actual values are taken with .value(). This method takes an XPath/XQuery in the first place and the target type in the second.

    Hope this is clear now... :-D