Search code examples
sqlsql-serverreferencesubqueryquery-optimization

Using the full SQL reference to the name of a database object causes my query to run 500 times slower


Using the full SQL reference to the name of a database object causes my query to run 500 times slower.

When I use the full database.dbo.tablename.columnname, a few UPDATE queries take nearly 500 seconds to run, but if I edit one of the WHERE/AND clauses, with just the columnname, the query completes in less than a second.

I've compared the results of both queries and they output exactly the same data.

I had assumed that it is safer, and potentially faster, to fully specify the reference, rather than using just the column name.

What is strange is that I have other update statements in the same query, that reference the same column in the same manner, and they do not suffer this issue.

An example of one of the queries is:

UPDATE database.dbo.SUMMARYTABLE
SET database.dbo.LifetimeCount = (select count(*) from database.dbo.WORKTABLE as T1 
WHERE T1.retailer = database.dbo.SUMMARYTABLE.retailer
and T1.SalesTier = 1
and T1.Zdate < database.dbo.SUMMARYTABLE.Zdate)

A very uncomplicated query. It took 500 seconds to run. If I simply change the final AND clause to this:

UPDATE database.dbo.SUMMARYTABLE
SET database.dbo.LifetimeCount = (select count(*) from database.dbo.WORKTABLE as T1 
WHERE T1.retailer = database.dbo.SUMMARYTABLE.retailer
and T1.SalesTier = 1
and T1.Zdate < Zdate)

It takes less than a second.

Perhaps it is some sort of indexing issue.

I am running MS SQL server developer edition 2019. I am running the query from within the Microsoft SQL Server Management Studio on the system that is running the SQL server. The query drops, recreates and then populates a secondary table, named SUMMARYTABLE, using UPDATES and SET statements, from a main table named WORKTABLE.

The main table (WORKTABLE) has over 7M rows, with about 30 columns of mixed date, varchar, int and floats. The second table, SUMMARYTABLE is all floats and has about 50 columns, and is usually only a few hundred to a few thousand rows.

As I was building this multi UPDATE query, it gradually became slower and slower the more UPDATES I added.

Assuming I needed to index the database, I ran the "Database Tuning Advisor" from within the Microsoft SSMS.

After selecting the query in question for "Workload", and selecting the two tables in the database, and starting the analysis, it returned a list of indexes it recommended, to which I applied all.

This made the query run much faster, dropping from around 40 minutes, to around 3 minutes, total.

I was very happy, and continued adding update clauses to the query. However, not long after, the query time blew out to over an hour.

I set "set statistics time on" to see where the slowdown was, and it was on some of the latest update statements.

So I re-ran the Database Tuning Advisor, applied any recommendations, once again, but the slowness persists.

While I can easily just use the column name by itself, what is the cause of this?


Solution

  • I'm guessing that you have a Zdate column on the WORKTABLE The problem is that these are actually two very different queries

          and T1.date < database.dbo.SUMMARYTABLE.Zdate
    

    What your first query is doing is making an outer reference, which means that the subquery needs to look at the outer table to get a value to compare.

    UPDATE database.dbo.SUMMARYTABLE
    SET LifetimeCount = (
        select count(*)
        from database.dbo.WORKTABLE as T1 
        WHERE T1.retailer = database.dbo.SUMMARYTABLE.retailer
          and T1.SalesTier = 1
          and T1.date < database.dbo.SUMMARYTABLE.Zdate
    )
    

    The other version, because it has no table reference on the column, it is assumed it is referring to the inner query. It effectively becomes

    UPDATE database.dbo.SUMMARYTABLE
    SET LifetimeCount = (
        select count(*)
        from database.dbo.WORKTABLE as T1 
        WHERE T1.retailer = database.dbo.SUMMARYTABLE.retailer
          and T1.SalesTier = 1
          and T1.Zdate < T1.Zdate
    )
    

    Since T1.Zdate < T1.Zdate is a logical inconsistency, I would expect the whole subquery to be completely removed by the optimizer in place of a simple SET LifetimeCount = 0.

    Which is almost certainly not what you expected.

    As a rule, always specify table references in subqueries, due to this issue. And there is no need to specify the database name, your connection should be specifying that.