sql-server performance indexing query-optimization

MS SQL Server slow UPDATE query with Eager Spool despite index within SSMS on local server

I'm running a SQL query within SSMS on the computer that is running Microsoft SQL Server 2019 Developer Edition. The query seems to be significantly slower than it should.

I added an index on the columns in question that the query uses, and the duration of the query improved from around 120 seconds to around 85 seconds, but I still think it should be faster, comparing it to other queries between the same tables, with similar column data types, that complete in less than a second.

The query updates a summary table by aggregating data from a worktable.

The summary table is approx 10,000 rows and 30 columns of mixed VARCHAR, FLOAT and INT columns.

The worktable table is approx 7,500,000 rows, with a similar number of columns and a similar mix of column data types.

Looking at the execution plan, it seems to do an Index Spool (Eager Spool) which, from reading, often can mean that the query is missing an index.

I have a non-clustered index on the three columns the query...queries.

The complete query is:

UPDATE SummaryTable 
SET LifeTeacherTopScore = (SELECT COUNT(*) 
                           FROM WorkTable AS T1 
                           WHERE T1.Z_date < SummaryTable.Z_date 
                             AND T1.Teacher = SummaryTable.Teacher 
                             AND T1.Score = 1)
WHERE SummaryTable.Teacher <> ''

The execution plan is here:

https://www.brentozar.com/pastetheplan/?id=ByEo9Ru42

On the WorkTable table I have a non-clustered index on Z_date, Teacher and Score.

There are around 10,000 unique entries in the Z_date column, around 25,000 unique entries in the Teacher column, and around 30 unique entries in the Score column. The teacher column contains around 1,500,000 NULL entries, hence why I exclude NULLs within the above query by finishing it with:

where SummaryTable.Teacher <> ''

Z_date format is DATE, Teacher format is VARCHAR(50), Score format is INT.

After doing an Index Scan (NonClustered), using the index I added, the query then does an Index Spool (Eager Spool), with a cost of 87%, which seems to be the reason for the duration of the scan.

I have read dozens of pages re slow queries, indexes, etc, without finding any page that has helped to isolate the issue.

Any suggestions, recommendations, on improving the performance of this query greatly appreciated.

Solution

The order of columns in an index matters. Try (Score, Teacher, Z_Date). What's this about?

SQL Server and other RDBMS systems use so-called B-Tree indexes. Think of them as old-school telephone books. To find the rows after a certain Z_Date for a particular Score and Teacher, you visit the section of the book with Scores of 1. You then look for the teacher, then you look for the first eligible date. You can then look at each row in turn.

In indexing terms, you random-access the B-Tree index to the first eligible row, then scan the rows sequentially. This only works if, in the index, the Z_Date column comes after the other two columns. If Z_Date were first in the column you'd have to pick through all the rows looking for matching ones, and that is expensive.

Pro tip: In SSMS: Right-click in a query window, then select Show Actual Execution Plan, then run the query. The execution plan display sometimes recommends a new index to create.