Search code examples
c#sqlsql-serverxmlsqlxml

Querying Serialized Dictionary in XML Column


I have a Dictionary that I'm serializing in C# and storing in an XML column on MSSQL Server. The serialized XML looks like this:

<ArrayOfKeyValueOfstringanyType xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
  <KeyValueOfstringanyType>
     <Key>code</Key><Value xmlns:d3p1="http://www.w3.org/2001/XMLSchema" i:type="d3p1:string">WFR 052</Value>
  </KeyValueOfstringanyType>
  <KeyValueOfstringanyType>
    <Key>type</Key><Value xmlns:d3p1="http://www.w3.org/2001/XMLSchema" i:type="d3p1:string">Newsletter</Value>
  </KeyValueOfstringanyType>
</ArrayOfKeyValueOfstringanyType>

Ultimately, I want to get the Value, where the Key is "code". The first step I took was to just get the first value, regardless of key.

SELECT [xml_column].value('(/ArrayOfKeyValueOfstringanyType/KeyValueOfstringanyType/Value)[1]','varchar(255)') as val
FROM [my_table]

I get null values back. I know it has something to do with the namespaces, because when I try the same query with the namespaces removed, I get a value. I've seen some other scenarios with namespaces, but my XML format is a bit different, and I'm struggling to find the right syntax.

Here's another question I looked at:

XML Field - Query


Solution

  • The problem you are having is due to the schema specification in your XML. If a node in your XML document is part of a schema you have to specify that schema when querying that node. Alternatively, you can use a wildcard for the schema specification. But specifying the node name without schema does not work (as you have experienced).

    So let's look at an example:

    SQL Fiddle

    MS SQL Server 2008 Schema Setup:

    CREATE TABLE dbo.Tbl(id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, dict XML);
    
    INSERT INTO dbo.Tbl(dict)
    VALUES('<ArrayOfKeyValueOfstringanyType xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
          <KeyValueOfstringanyType>
             <Key>code</Key><Value xmlns:d3p1="http://www.w3.org/2001/XMLSchema" i:type="d3p1:string">WFR 052</Value>
          </KeyValueOfstringanyType>
          <KeyValueOfstringanyType>
            <Key>type</Key><Value xmlns:d3p1="http://www.w3.org/2001/XMLSchema" i:type="d3p1:string">Newsletter</Value>
          </KeyValueOfstringanyType>
        </ArrayOfKeyValueOfstringanyType>');
    

    The table dbo.Tbl is created with just two columns, an identity id column and a dict column for the XML.

    To get your first query to work, specify the schema using a wildcard for each node:

    Query 1:

    SELECT dict.value('/*:ArrayOfKeyValueOfstringanyType[1]/*:KeyValueOfstringanyType[1]/*:Key[1]','NVARCHAR(MAX)')
    FROM dbo.Tbl;
    

    That results in the first Key being returned:

    Results:

    | COLUMN_0 |
    |----------|
    |     code |
    

    Now, you want to return the Value nodes for all key value pairs where the Key = 'code'. You could do that filtering in xquery, but I usually prefer filtering in SQL. For that we first need to get all pairs back. The XML nodes function gets us a step closer:

    Query 2:

    SELECT id,key_value.query('.')
      FROM dbo.Tbl
     CROSS APPLY dict.nodes('/*:ArrayOfKeyValueOfstringanyType/*:KeyValueOfstringanyType') AS N(key_value);
    

    It returns one row per KeyValueOfstringanyType node:

    Results:

    | ID |                                                                                                                                                                                                                                                                                                      COLUMN_1 |
    |----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
    |  1 |    <p1:KeyValueOfstringanyType xmlns:p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays"><p1:Key>code</p1:Key><p1:Value xmlns:d3p1="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="d3p1:string">WFR 052</p1:Value></p1:KeyValueOfstringanyType> |
    |  1 | <p1:KeyValueOfstringanyType xmlns:p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays"><p1:Key>type</p1:Key><p1:Value xmlns:d3p1="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="d3p1:string">Newsletter</p1:Value></p1:KeyValueOfstringanyType> |
    

    Using that, we can get to the Key and Value using the XML.value function: Query 3:

    SELECT id,
           key_value.value('./*:Key[1]','NVARCHAR(MAX)') AS [key],
           key_value.value('./*:Value[1]','NVARCHAR(MAX)') AS [value]
      FROM dbo.Tbl
     CROSS APPLY dict.nodes('/*:ArrayOfKeyValueOfstringanyType/*:KeyValueOfstringanyType') AS N(key_value);
    

    Now we have a row per key value pair with the key and the value in separate columns:

    Results:

    | ID |  KEY |      VALUE |
    |----|------|------------|
    |  1 | code |    WFR 052 |
    |  1 | type | Newsletter |
    

    From there is is easy to apply additional filters in a WHERE clause:

    Query 4:

    WITH KeyValues AS(
    SELECT id,
           key_value.value('./*:Key[1]','NVARCHAR(MAX)') AS [key],
           key_value.value('./*:Value[1]','NVARCHAR(MAX)') AS [value]
      FROM dbo.Tbl
     CROSS APPLY dict.nodes('/*:ArrayOfKeyValueOfstringanyType/*:KeyValueOfstringanyType') AS N(key_value)
    )
    SELECT *
      FROM KeyValues
     WHERE [Key] = 'code';
    

    Results:

    | ID |  KEY |   VALUE |
    |----|------|---------|
    |  1 | code | WFR 052 |