Search code examples
sql-serverindexingcovering-index

How can a covering index satisfy more than one query?


I've inherited a MS Sql database hosted in Azure. Looking for performance improvement, I've been reading a lot about indexing and covering index. (Maybe this is the most complete reading that I've found: https://www.red-gate.com/simple-talk/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/)

But one doubt still remains...

So, for example, for the billing table below (which has about 8 milion lines), I've found that the most used fields used in the queries' where clause are (within joins or not): PAYMENT_DATE, DUE_DATE, CUSTOMER_ID, DELAY_DAYS, AMOUNT .

CREATE TABLE [dbo].[BILLING](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [CHANGED_DATE] [datetime] NULL,
    [INCLUDED_DATE] [datetime] NULL,
    [CHANGED_USER_ID] [int] NULL,
    [INCLUDED_USER_ID] [int] NULL,
    [BILL_CODE] [varchar](255) NOT NULL,
    [PAYMENT_DATE] [datetime] NULL,
    [DUE_DATE] [datetime] NOT NULL,
    [AMOUNT] [float] NOT NULL,
    [AMOUNT_PAYED] [float] NULL,
    [CUSTOMER_ID] [int] NOT NULL,
    [OUR_NUMBER] [varchar](200) NULL,
    [TYPE] [varchar](250) NULL,
    [BANK_ID] [int] NULL,
    [ISSUE_DATE] [datetime] NULL,
    [STATE] [varchar](20) NULL,
    [DUNNING_STATE_ID] [int] NULL,
    [OPEN_VALUE] [float] NULL,
    [ACCREDIT_VALUE] [float] NULL,
    [LOWER_VALUE] [float] NULL,
    [DISCCOUNT_VALUE] [float] NULL,
    [INTEREST_VALUE] [float] NULL,
    [FINE_VALUE] [float] NULL,
    [RECEIVED_AMOUNT] [float] NULL,
    [DELAY_DAYS] [int] NULL,
    [BRANCH_ID] [int] NULL,
    [FIELD1] [varchar](250) NULL,
    [FIELD2] [varchar](250) NULL,
    [FIELD3] [varchar](250) NULL,
    [FIELD4] [varchar](250) NULL,
    [FIELD5] [varchar](250) NULL,
    [OBS1] [varchar](250) NULL,
    [OBS2] [varchar](250) NULL,
    [OBS3] [varchar](250) NULL,
    [INTEREST_RATE] [float] NULL,
    [INTEREST_CALC] [float] NULL,
    [AGREEMENT_STATE] [varchar](20) NULL,
    [AGREEMENT_ID] [int] NULL,
PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)

In addition, the target queries for optimizing make calculations on select clause over: AMOUNT, DELAY_DAYS, COUNT(ID). For example:

SELECT
        T.CUSTOMER_ID AS CUSTOMER_ID
        , COUNT(T.ID) AS NUM_BILLS
        , COUNT(
            CASE
                WHEN T.DELAY_DAYS <= 0 THEN 1
                ELSE NULL
            END
        ) AS DEPOSITS
        , COUNT(
            CASE
                WHEN T.DELAY_DAYS > 0 THEN 1
                ELSE NULL
            END
        ) AS DEFAULTED
        , COUNT(
            CASE
                WHEN T.DELAY_DAYS BETWEEN 30 AND 60 THEN 1
                ELSE NULL
            END
        ) AS DEFAULTED_30
        , COUNT(
            CASE
                WHEN T.DELAY_DAYS BETWEEN 60 AND 90 THEN 1
                ELSE NULL
            END
        ) AS DEFAULTED_60
        , COUNT(
            CASE
                WHEN T.DELAY_DAYS > 90 THEN 1
                ELSE NULL
            END
        ) AS DEFAULTED_90
        , MAX(T.DELAY_DAYS) AS MAX_DEFAULTED_TIME
        , SUM(
            CASE
                WHEN T.DELAY_DAYS > 0 THEN T.DELAY_DAYS
                ELSE 0
            END
        ) AS SUM_DEFAULTED_TIME
        , SUM(T.AMOUNT) AS AMOUNT
        , SUM(
            CASE
                WHEN T.DELAY_DAYS > 0 THEN T.AMOUNT
                ELSE 0
            END
        ) AS DEFAULTED_AMOUNT
    FROM BILLING T
    WHERE
        T.DUE_DATE < GETDATE()
        AND T.AMOUNT > 0
    GROUP BY
        T.CUSTOMER_ID

Thus, it looked obvious to me that the following index would solve all of my problems:

CREATE NONCLUSTERED INDEX [ix_Titulo_main_fields] ON [dbo].[BILLING]
(
    [PAYMENT_DATE] ASC,
    [DUE_DATE] DESC,
    [AMOUNT] ASC,
    [CUSTOMER_ID] ASC,
    [STATE] ASC,
    [DELAY_DAYS] ASC,
    [BRANCH_ID] ASC,
    [AGREEMENT_ID] ASC
)
INCLUDE (   [BILLING_CODE],
    [AGREEMENT_STATE],
)
GO;

In contrast, when I ask for the query plan on Management Studio, the SQL Server doesn't use this index and suggests me creating a new one:

CREATE NONCLUSTERED INDEX [ix_billing_due_date_amount] ON [dbo].[billing]
(
    [due_date] ASC,
    [amount] ASC
)
INCLUDE (   [customer_id],
    [delay_days])
GO

So, the doubts are:
does the covering index needs to be exactly what is the WHERE clause searching for?
If it's true, how can a covering index satisfy more than one query?
Otherwise, why don't the former index satisfy the query?

I really don't know where I've missed something...

Thanks in advance!


Solution

  • Order matters. Since your proposed index begins with [payment_date] but the query predicate does not include [payment_date], the index is not likely to be more advantageous than a table scan.

    It is possible to have a single index cover multiple queries. The first listed field of the index will almost always need be in the predicate of all the queries. For improved results, also apply this logic to the 2nd field, 3rd field, etc.

    When there is more than one choice for a position, one choice may perform better than another.

    Side note: Oracle has a feature called "index skip scan" that allows an index to be used even if the leading column is not in the predicate. It is effective when the leading column has few distinct values (from learningintheopen.org).