Search code examples
sqlsql-serversql-server-2017

Parse out XML data into Rows in SQL Server


I am trying to parse WordPress data in our SQL Server from an Elasticsearch structure.

This is the contents of one field on one record;

<category domain="category" nicename="featured">Featured</category>
<category domain="post_tag" nicename="name1">Name 1</category>
<category domain="post_tag" nicename="name-2">Name 2</category>
<category domain="post_tag" nicename="different-name">Different Name</category>
<category domain="type" nicename="something-else">Something Else</category>

I'd like to parse this out as a table with the headers Domain, NiceName and Contents and a row for each of these nodes in the data. Something along these lines;

Domain NiceName Contents
category featured Featured
post_tag name1 Name 1
post_tag name-2 Name 2
post_tag different-name Different Name
type something-else Something Else

The number of nodes is different for each row in the data and can appear in any order. Currently the data is stored in a varchar data type but this can be modified if it's best to parse using something like XML.


Solution

  • It's recommended that you use the xml data type for storing XML data. But if you must store it in a varchar column you can use try_cast to cast it to XML (which results in null if it's not actually valid XML) and then work with it using the normal nodes(), query() and value() XML methods such as the following...

    create table dbo.Records (
      OneField varchar(max)
    );
    insert dbo.Records (OneField) values
      ('<category domain="category" nicename="featured">Featured</category>
    <category domain="post_tag" nicename="name1">Name 1</category>
    <category domain="post_tag" nicename="name-2">Name 2</category>
    <category domain="post_tag" nicename="different-name">Different Name</category>
    <category domain="type" nicename="something-else">Something Else</category>');
    
    select
      Category.value('@domain', 'varchar(50)') as [Domain],
      Category.value('@nicename', 'varchar(50)') as [NiceName],
      Category.value('(text())[1]', 'varchar(50)') as [Contents]
    from dbo.Records R
    cross apply (select try_cast(OneField as XML)) X(OneFieldXML)
    cross apply OneFieldXML.nodes('/category') N(Category);
    
    Domain NiceName Contents
    category featured Featured
    post_tag name1 Name 1
    post_tag name-2 Name 2
    post_tag different-name Different Name
    type something-else Something Else