Search code examples
sql-serverxpathquery-performancequery-tuning

Tuning a query to parse XML data on SQL Server 2014


I have a table on a SQL Server 2014 database that stores auditing information for record changes in a VARCHAR(MAX) column (poor man's CDC.)

This data is in following format:

<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />
...

I need to parse that information in order to retrieve the data transposed so that it looks like so:

Record    FieldName      OldValue   NewValue
------    ---------      --------   --------
1234      Assigned To    user1      user2
1234      Status         QA         Development
1234      Progress       Yes        No

The stored procedure attempts to do this by converting the data to XML and then using XPath retrieving the necessary pieces:

;WITH TT AS (
   SELECT TransId,
      CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
   FROM dbo.Trans
   WHERE TransDate >= '11/1/2016'
      AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))
SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
INTO #tmp
FROM TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);

Here is the execution plan: https://www.brentozar.com/pastetheplan/?id=rJF2GRB7g

The corresponding IO stats:

Table 'Trans'. Scan count 9, logical reads 27429, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 2964994, physical reads 0, read-ahead reads 0, lob logical reads 2991628, lob physical reads 0, lob read-ahead reads 0.

This query is excruciatingly slow (the example was for just 10 days' worth of data,) and gets progressively slower with more data.

What are my options for tuning this query?


Solution

  • What you really need to speed things up is some xml indexing. However, since you are creating the XML on the fly, this isn't happening. Effectively, this is the broad equivalent of a CROSS JOIN, and will get exponentially slower as time goes by.

    See cross apply xml query performs exponentially worse as xml document grows for a detailed discussion and how indexing helps. If you want to do this via XML, you really will need to store XML so you can index the XML.