Search code examples
sql-servert-sqlselectsql-execution-plan

Force Index seek rather than scan to get next record in T-SQL


I have a database with many tables. I need a query to read the next record in key order, given the keys of the current record. [NOTE: I am emulating an old C-ISAM based system, which reads records one at a time, so I can't use a normal set-based operation]. For example, given a table with a single key, my query might look like:

SELECT TOP 1 COL1, COL2, COL3, etc.
FROM TAB1
WHERE COL1 > @P0
ORDER BY COL1

With two key columns, it gets a little trickier:

SELECT TOP 1 COL1, COL2, COL3, etc.
FROM TAB1
WHERE COL1 = @P0 AND COL2 > @P1
   OR COL1 > @P0
ORDER BY COL1, COL2

Adding additional key columns follows the same pattern.

Now, if you think about it, SQL Server should be able to use an INDEX SEEK operator to find the desired record, and indeed it does so for most of my tables. However, I have found one table where, no matter what I have tried, SQL Server insists on doing an INDEX SCAN, which is of course much slower in a big table (which this table is).

I have tried many things including using WITH (FORCESEEK), OPTIMIZE FOR, and several others. In all cases I end up with an INDEX SCAN.

However, I did find one query pattern that resulted in two seeks and a concatenation:

SELECT TOP 1 COL1, COL2, COL3, etc.
FROM ( 
   SELECT TOP 1 COL1, COL2, COL3, etc.
   FROM TAB1
   WHERE COL1 = @P0 AND COL2 > @P1

   UNION ALL

   SELECT TOP 1 COL1, COL3, COL3, etc.
   FROM TAB
   WHERE COL1 > @P0
) AS Q
ORDER BY COL1, COL2

This is an improvement, but still generates an inefficient query plan. The more key fields there are, the worse the plan becomes. I don't want to use this technique since most of my tables work fine with the simpler query, resulting in a single INDEX SEEK in the plan.

Note that the query code is pre-generated for each table, not hand coded as needed, so it would be difficult to use different schemes for different tables. There are over 1,000 tables in the database, so it would be impractical to determine which tables have the issue and which do not.

This scheme will be rather heavily used (about 100+ users), including lots of iterating through entire tables [again: I can't use set operations] so efficiency is important. For example, I have one process that, because of the index scan, takes about an hour to get through a 100,000 record table (an average of only 27 records per second). It starts out reasonably quickly, but as it works through the table, gets slower and slower. Scanning another table, which doesn't exhibit the problem, results in about 2,500 records per second, which is acceptable.

I am looking for a way to 'force' SQL Server to use the INDEX SEEK rather than the INDEX SCAN.

Here is the schema of a table that works:

CREATE TABLE [dbo].[CUSHST](
    [ID_] [bigint] IDENTITY(1,1) NOT NULL,
    [CUSTOMER_NUMBER] [nvarchar](5) NOT NULL,
    [ORDER_NUMBER] [decimal](6, 0) NOT NULL,
    [ORDER_DATE] [date] NOT NULL,
    [ORDER_TYPE] [nvarchar](1) NOT NULL,
    [SLS_CODE] [nvarchar](3) NOT NULL,
    [INVOICE_NUMBER] [decimal](6, 0) NOT NULL,
    [INVOICE_DATE] [date] NOT NULL,
    [GOLD_PRICE] [decimal](8, 2) NOT NULL,
    [SILVER_PRICE] [decimal](8, 2) NOT NULL,
    [ITEM_NUMBER] [nvarchar](10) NOT NULL,
    [PRODUCT_CATEGORY] [nvarchar](2) NOT NULL,
    [DESCRIPTION] [nvarchar](30) NOT NULL,
    [EXT_DESCRIPTION] [nvarchar](30) NOT NULL,
    [UNIT_OF_MEASURE] [nvarchar](3) NOT NULL,
    [QTY_ORDERED] [decimal](8, 2) NOT NULL,
    [QTY_SHIPPED] [decimal](8, 2) NOT NULL,
    [PRICE_PER_UNIT] [decimal](8, 2) NOT NULL,
    [DISCOUNT_PER_UNIT] [decimal](8, 2) NOT NULL,
    [LABOR_PER_UNIT] [decimal](8, 2) NOT NULL,
    [COST_PER_UNIT] [decimal](8, 2) NOT NULL,
    [SALE_AMOUNT] [decimal](9, 2) NOT NULL,
    [COST_AMOUNT] [decimal](9, 2) NOT NULL,
    [PO_NUMBER] [nvarchar](20) NOT NULL,
    [REVERSE_DATE] [decimal](7, 0) NOT NULL,
    [COMMENTS_1] [nvarchar](35) NOT NULL,
    [COMMENTS_2] [nvarchar](35) NOT NULL,
    [SHIPTO_NAME] [nvarchar](35) NOT NULL,
    [SHIPTO_ADDRESS_1] [nvarchar](35) NOT NULL,
    [SHIPTO_ADDRESS_2] [nvarchar](35) NOT NULL,
    [SHIPTO_CITY] [nvarchar](25) NOT NULL,
    [SHIPTO_STATE] [nvarchar](2) NOT NULL,
    [SHIPTO_ZIP] [nvarchar](10) NOT NULL,
    [TERM_CODE] [nvarchar](3) NOT NULL,
    [LENGTH_CODE] [nvarchar](10) NOT NULL,
    [SIZE_CODE] [nvarchar](12) NOT NULL,
    [TYPE_CODE] [nvarchar](2) NOT NULL,
    [KARAT] [nvarchar](4) NOT NULL,
    [SHIP_VIA_CODE] [nvarchar](3) NOT NULL,
    [SHIP_DATE] [date] NOT NULL,
    [PIECES_ORDERED] [decimal](7, 0) NOT NULL,
    [PIECES_SHIPPED] [decimal](7, 0) NOT NULL,
    [SURCHARGE] [decimal](8, 2) NOT NULL,
    [TIE_BREAKER] [int] NOT NULL,
    [PD_PRICE] [decimal](8, 2) NOT NULL,
    [PT_PRICE] [decimal](8, 2) NOT NULL,
    [WIDTH_CODE] [nvarchar](10) NOT NULL,
    [BACKORDER_FLAG] [nvarchar](1) NOT NULL,
    [PROD_BATCH1] [decimal](6, 0) NOT NULL,
    [PROD_BATCH2] [decimal](6, 0) NOT NULL,
    [PROD_BATCH3] [decimal](6, 0) NOT NULL,
    [PROD_BATCH4] [decimal](6, 0) NOT NULL,
    [PROD_BATCH5] [decimal](6, 0) NOT NULL,
    [PROD_BATCH6] [decimal](6, 0) NOT NULL,
    [PROD_BATCH_STRING] [nvarchar](15) NOT NULL,
    [LINE_NUMBER] [decimal](4, 0) NOT NULL,
    [BASE_SURCHARGE] [decimal](8, 2) NOT NULL,
    [SHIPTO_COUNTRY] [nvarchar](2) NOT NULL,
    [APPROVED_BY] [nvarchar](15) NOT NULL,
    [ENTERED_BY] [nvarchar](15) NOT NULL,
    [TSR_ACCOUNT_ID] [nvarchar](4) NOT NULL,
    [CC_CHARGE] [decimal](6, 2) NOT NULL,
    [QTY_ORDERED3] [decimal](10, 3) NOT NULL,
    [QTY_SHIPPED3] [decimal](10, 3) NOT NULL,
    [USE_LINE_PRICING] [nvarchar](1) NOT NULL,
    [GOLD_PRICE_LP] [decimal](8, 2) NOT NULL,
    [SILVER_PRICE_LP] [decimal](8, 2) NOT NULL,
    [PT_PRICE_LP] [decimal](8, 2) NOT NULL,
    [PD_PRICE_LP] [decimal](8, 2) NOT NULL,
    [COMMENTS_3] [nvarchar](35) NOT NULL,
    [COMMENTS_4] [nvarchar](35) NOT NULL,
    [INCO_TERM] [nvarchar](3) NOT NULL,
    [CURRENCY_FACTOR] [decimal](8, 7) NOT NULL,
    [PF_SIZE_CODE] [nvarchar](25) NOT NULL,
    [PF_BANDWIDTH] [nvarchar](20) NOT NULL,
    [LOGO] [nvarchar](1) NOT NULL,
    [SHIPTO_ATTENTION] [nvarchar](35) NOT NULL,
    [PROFORMA_AIR_DEST] [nvarchar](35) NOT NULL,
    [TIME_PREPPED] [int] NOT NULL,
    [OWN_LABEL] [nvarchar](1) NOT NULL,
    [FROM_MEMO] [nvarchar](1) NOT NULL,
    [FROM_MEMO_CUSTOMER_NUMBER] [nvarchar](5) NOT NULL,
    [PURCHASE_TO_POOL] [nvarchar](1) NOT NULL,
    [ROWVERSION] [timestamp] NOT NULL,
 CONSTRAINT [PK_CUSHST] PRIMARY KEY CLUSTERED 
(
    [CUSTOMER_NUMBER] ASC,
    [INVOICE_DATE] ASC,
    [INVOICE_NUMBER] ASC,
    [ITEM_NUMBER] ASC,
    [TIE_BREAKER] ASC,
    [ID_] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

Here is the table that fails:

CREATE TABLE [dbo].[GLJNL](
    [ID_] [bigint] IDENTITY(1,1) NOT NULL,
    [ACCOUNT_NUMBER] [decimal](7, 0) NOT NULL,
    [TRX_DATE] [date] NOT NULL,
    [DEBIT_AMOUNT] [decimal](11, 2) NOT NULL,
    [CREDIT_AMOUNT] [decimal](11, 2) NOT NULL,
    [SOURCE] [nvarchar](6) NOT NULL,
    [REFERENCE] [nvarchar](35) NOT NULL,
    [REF_1] [nvarchar](35) NOT NULL,
    [REF_2] [nvarchar](35) NOT NULL,
    [ROWVERSION] [timestamp] NOT NULL,
 CONSTRAINT [PK_GLJNL] PRIMARY KEY CLUSTERED 
(
    [ACCOUNT_NUMBER] ASC,
    [TRX_DATE] ASC,
    [ID_] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

Solution

  • The following query generates a perfectly efficient execution plan for me.

    DECLARE @ACCOUNT_NUMBER decimal(7,0), @TRX_DATE DATE, @ID_ BIGINT
    
    SELECT TOP 1 * 
    FROM  [dbo].[GLJNL]
    where (ACCOUNT_NUMBER = @ACCOUNT_NUMBER and TRX_DATE = @TRX_DATE and ID_ >= @ID_)
     or (ACCOUNT_NUMBER = @ACCOUNT_NUMBER and TRX_DATE > @TRX_DATE)
     or (ACCOUNT_NUMBER > @ACCOUNT_NUMBER)
    ORDER BY 
        [ACCOUNT_NUMBER] ASC,
        [TRX_DATE] ASC,
        [ID_] ASC
    

    It just has one seek operator with three disjunct seek predicates and will process those in order and return the first row from the first one that returns a result.

    enter image description here

    It would be "nicer" if it just did a single seek predicate starting at (@ACCOUNT_NUMBER, @TRX_DATE, @ID_) but there is no way of expressing the query to get SQL Server to do that AFAIK (it doesn't support syntax such as WHERE (ACCOUNT_NUMBER, TRX_DATE, ID_) > (@ACCOUNT_NUMBER, @TRX_DATE, @ID_)) and the difference in practical terms is pretty minimal anyway.

    If you are already using this pattern then check your parameter datatypes exactly match the column datatypes