sql sql-server sql-server-2008 database-performance database-cursor

When will a FAST_FORWARD cursor have a work table (and is this something to avoid)?

Background

I noticed whilst experimenting with running total queries that sometimes the estimated plan just shows a "Fetch Query"

Fetch

and the actual plan shows repeated Fetches from the Clustered Index Scan

Fetch Scan

on other occasions (e.g when adding a TOP to the query) the estimated plan shows a "Population Query" stage that populates a work table

Fetch and Populate

With the actual plan showing a clustered index scan to populate the work table then repeated seeks against that work table.

Seeks

Question

What criteria does SQL Server use in choosing one approach over the other?
Would I be right in thinking that the first method (without the additional work table population step) is more efficient?

(Bonus question: If anyone could explain why each scan in the first query counts as 2 logical reads that might be quite enlightening too)

Additional Information

I have found this article here which explains that FAST_FORWARD cursors can either use a dynamic plan or a static plan. The first query in this case appears to be using a dynamic plan and the second one a static plan.

I've also found that if I try

SET @C2 = CURSOR DYNAMIC TYPE_WARNING FOR SELECT TOP ...

The cursor gets implicitly converted to a keyset cursor so it is clear that the TOP construct is not supported for dynamic cursors, perhaps for the reasons in Ruben's answer - Still looking for a definitive explanation of this.

However I have also read that dynamic cursors tend to be slower than their static counterparts (source 1, source 2) which seems surprising to me given that the static variety have to read the source data, copy it, then read the copy rather than just read the source data. The article I referenced earlier mentions that dynamic cursors use markers. Can anyone explain what these are? Is it just a RID or the CI key, or something different?

Script

SET STATISTICS IO OFF

CREATE TABLE #T ( ord INT IDENTITY PRIMARY KEY, total INT, Filler char(8000))

INSERT INTO #T (total) VALUES (37),(80),(55),(31),(53)

DECLARE @running_total INT, 
    @ord INT, 
    @total INT
    
SET @running_total = 0
SET STATISTICS IO ON
DECLARE @C1 AS CURSOR;
SET @C1 = CURSOR FAST_FORWARD FOR SELECT ord, total FROM #T ORDER BY ord;
OPEN @C1;
PRINT 'Initial FETCH C1'
FETCH NEXT FROM @C1 INTO @ord, @total ;
WHILE @@FETCH_STATUS = 0
BEGIN
  SET @running_total = @running_total + @total
  PRINT 'FETCH C1'
  FETCH NEXT FROM @C1 INTO @ord, @total ;
END

SET @running_total = 0
SET STATISTICS IO ON
DECLARE @C2 AS CURSOR;
SET @C2 = CURSOR FAST_FORWARD FOR SELECT TOP 5 ord, total FROM #T ORDER BY ord;
OPEN @C2;
PRINT 'Initial FETCH C2'
FETCH NEXT FROM @C2 INTO @ord, @total ;
WHILE @@FETCH_STATUS = 0
BEGIN
  SET @running_total = @running_total + @total
  PRINT 'FETCH C2'
  FETCH NEXT FROM @C2 INTO @ord, @total ;
END

PRINT 'End C2'
DROP TABLE #T

Solution

What criteria does SQL Server use in choosing one approach over the other?

It is primarily a cost-based decision. Quoting from the article you linked to, "In situations where the dynamic plan looks promising, the cost comparison may be heuristically skipped. This occurs mainly for extremely cheap queries, though the details are esoteric."

Would I be right in thinking that the first method (without the additional work table population step) is more efficient?

It depends. Dynamic and static cursor plans have different strengths and weaknesses. If all rows will eventually be touched, the static plan is likely to perform better. More on this in a moment.

It is clear that the TOP construct is not supported for dynamic cursors

This is true. All iterators in a dynamic cursor plan must be able to save and restore state, scan forward and backward, process one input row for each output row, and be non-blocking. Top, in general, does not satisfy all these requirements; the class CQScanTopNew does not implement the necessary Set/Get/Goto/Marker(), and ReverseDirection() methods (among others).

I have also read that dynamic cursors tend to be slower than their static counterparts.

This is often true, for Transact-SQL cursors, where most or all of the cursor set is touched. There is a cost associated with saving and restoring the state of a dynamic query plan. Where a single row is processed on each call, and all rows eventually touched, this save/restore overhead is maximized.

Static cursors have the overhead of making a copy of the set (which may be the dominant factor for a large set), but the per-row cost of retrieval is quite small. Keysets have a higher per-row retrieval overhead than static because they must outer join back to the source tables to retrieve non-key columns.

Dynamic cursors are optimal when a relatively small fraction of the set is accessed, and/or retrieval is not row-at-a-time. This is a typical access pattern in many common cursor scenarios, just not the ones blog posts tend to test :)

If anyone could explain why each scan in the first query counts as 2 logical reads that might be quite enlightening too

This is down to the way state is saved for the scan, and the way reads are counted.

The article I referenced earlier mentions that dynamic cursors use markers. Can anyone explain what these are? Is it just a RID or the CI key, or something different?

Markers exist for each iterator in a dynamic cursor plan, not just access methods. The 'marker' is all the state information necessary to restart the plan iterator at the point it left off. For an access method, an RID or index key (with uniquifier if necessary) is a big part of this, but not the whole story by any means.