Search code examples
sqlsql-serverperformanceazure-sql-databasexml-column

Why does the inclusion of an XML column in a SELECT query have such a radically negative effect on query performance?


I've been struggling with a query performance issue for a few weeks now. At this point I've squeezed absolutely everything out of the query in terms of JOIN types, Indexing, Keeping Statistics up to date, etc... etc... but then I stumbled on something by accident.

A little background.

The table in question represents a Record

Id INT PK
Name NVARCHAR(50)
Status INT FK 
Created DATETIME
Version NVARCHAR(10)
Data XML

After some performance benchmarking, I realised that the inclusion of the final column in the select far outweighs things like indexing, join complexity & network considerations by somewhere between a factor 10x & 20x.

The following comparisons were done between SSMS on local dev machine connecting to SQL Azure.

SELECT Id FROM Records -- ~10 secs for 300,000 rows
SELECT Id, Name, Status, Created, Version FROM Records -- ~20 sec for 300,000 rows
SELECT * FROM Records -- ~350 sec for 300,000 rows

To be clear, I'm not doing anything crazy with the xml column (XML DML or XPath queries). Just simply including/excluding it from the select.

At this point, I think I've solved my problem by creating a RecordLight Entity, NHibernate Map & MVC Controller stack, purely for the purposes of searching & listing in our App.

But I'd like to understand why the inclusion of the XML column is having such a negative effect on Query Performance


Solution

  • One thing to consider is the size in bytes of your XML data.

    If you're connecting to a remote DB server for example, all that data needs to be downloaded to your client (even if the client is SSMS).

    I've seen the same thing with blob columns which contain MB's of data for example.

    If you do something like:

    SELECT Id, LEFT(Data, 10) FROM Records
    

    Do you see the same time to return the data?