Search code examples
sqlsql-serverindexingquery-optimization

How do you add an index to remove the clustered index scan on a table?


I am trying to optimize a SQL query in SSMS. When I execute it and review the execution plan, there is a clustered index scan on my table called SalesOrderLineItem.

Below is my SQL query

SELECT
    YEAR(SO.SALESORDERDATE) AS SALESYEAR,
    C.COUNTRYNAME,
    S.SEGMENTNAME,
    SP.FIRSTNAME,
    SP.LASTNAME,
    P.PRODUCTNAME,
    COUNT(*) AS TOTALPRODUCTSALES,
    SUM(
        CASE
            WHEN SLI.PROMOTIONID = 0 THEN 0
            ELSE 1
        END
    ) AS TOTALPROMOTIONALSALES
FROM
    SALESORDERLINEITEM SLI
    INNER JOIN PRODUCT P ON P.PRODUCTID = SLI.PRODUCTID
    INNER JOIN SALESORDER SO ON SO.SALESORDERID = SLI.SALESORDERID
    INNER JOIN SALESREGION SR ON SR.SALESREGIONID = SO.SALESREGIONID
    INNER JOIN SALESPERSON SP ON SP.SALESPERSONID = SR.SALESPERSONID
    INNER JOIN REGION R ON R.REGIONID = SR.REGIONID
    INNER JOIN SEGMENT S ON S.SEGMENTID = R.SEGMENTID
    INNER JOIN COUNTRY C ON C.COUNTRYID = R.COUNTRYID
WHERE
    YEAR(SO.SALESORDERDATE) > 2012
GROUP BY
    YEAR(SO.SALESORDERDATE),
    C.COUNTRYNAME,
    S.SEGMENTNAME,
    SP.FIRSTNAME,
    SP.LASTNAME,
    P.PRODUCTNAME;

The clustered index scan has a cost of 48% and a duration of around 0.389s.

I have tried the following and it reduced the cost to 38%, with a duration of around 0.211s, however I would like to design an index to remove the clustered index scan on SalesOrderLineItem.

CREATE  INDEX IDX ON SALESORDERLINEITEM (PRODUCTID, PROMOTIONID, SALESORDERID);

GO

SELECT
    YEAR(SO.SALESORDERDATE) AS SALESYEAR,
    C.COUNTRYNAME,
    S.SEGMENTNAME,
    SP.FIRSTNAME,
    SP.LASTNAME,
    P.PRODUCTNAME,
    COUNT(*) AS TOTALPRODUCTSALES,
    SUM(
        CASE
            WHEN SLI.PROMOTIONID = 0 THEN 0
            ELSE 1
        END
    ) AS TOTALPROMOTIONALSALES
FROM
    SALESORDERLINEITEM SLI
    INNER JOIN PRODUCT P ON P.PRODUCTID = SLI.PRODUCTID
    INNER JOIN SALESORDER SO ON SO.SALESORDERID = SLI.SALESORDERID
    INNER JOIN SALESREGION SR ON SR.SALESREGIONID = SO.SALESREGIONID
    INNER JOIN SALESPERSON SP ON SP.SALESPERSONID = SR.SALESPERSONID
    INNER JOIN REGION R ON R.REGIONID = SR.REGIONID
    INNER JOIN SEGMENT S ON S.SEGMENTID = R.SEGMENTID
    INNER JOIN COUNTRY C ON C.COUNTRYID = R.COUNTRYID
WHERE
    YEAR(SO.SALESORDERDATE) > 2012
GROUP BY
    YEAR(SO.SALESORDERDATE),
    C.COUNTRYNAME,
    S.SEGMENTNAME,
    SP.FIRSTNAME,
    SP.LASTNAME,
    P.PRODUCTNAME;

What do I need to do to remove this and decrease the overall cost/execution time?


Solution

  • There simply isn't enough info here to give a definitive answer; we don't know the layout of your tables nor the indexes available nor the amount and spread of data in there.

    That said, as mentioned by Larnu, don't use WHERE function(field) = ..., it makes it (virtually) impossible to optimally use indexes on said field. Going by the plan provided you already did that.

    From the plan it looks like you're having

    • a table-scan on SALESORDER because of the filter on SO.SALESORDERDATE and the fact you also need SO.SALESREGIONID and SALESORDERID on top of SO.SALESORDERDATE from that table. Adding a covering index might help out here.
    • The reason the SALESORDERLINEITEM part of costly is because it's that big. The system has to go through near 125 million row in a rather non-optimal way, the fact that your query runs in sub-second is in fact pretty impressive if you think about it.
    • You already managed to get rid of the Clustered Index Scan on the SALESORDERLINEITEM table by adding your index but the reason the index is chosen over the table is probably because it's 'slimmer' than the actual table, not so much because it's helping as 'an index'. I would suggest to rather put SALESORDERID in front as then the system will more likely be able to first filter on the requested year in SALESORDER and use this already limited list to only have to read the necessary records from SALESORDERLINEITEM based on the corresponding SALESORDERID values.

    So, try this:

    CREATE INDEX idx_test ON SALESORDERLINEITEM (SALESORDERID, PRODUCTID) INCLUDE (PROMOTIONID);
    CREATE INDEX idx_test ON SALESORDER (SALESORDERDATE, SALESORDERID) INCLUDE (SALESREGIONID);
    
    SELECT
        YEAR(SO.SALESORDERDATE) AS SALESYEAR,
        C.COUNTRYNAME,
        S.SEGMENTNAME,
        SP.FIRSTNAME,
        SP.LASTNAME,
        P.PRODUCTNAME,
        COUNT(*) AS TOTALPRODUCTSALES,
        SUM(
            CASE
                WHEN SLI.PROMOTIONID = 0 THEN 0
                ELSE 1
            END
        ) AS TOTALPROMOTIONALSALES
    FROM
        SALESORDERLINEITEM SLI
        INNER JOIN PRODUCT P ON P.PRODUCTID = SLI.PRODUCTID
        INNER JOIN SALESORDER SO ON SO.SALESORDERID = SLI.SALESORDERID
                                AND SO.SALESORDERDATE >= Convert(datetime, '20120101')
        INNER JOIN SALESREGION SR ON SR.SALESREGIONID = SO.SALESREGIONID
        INNER JOIN SALESPERSON SP ON SP.SALESPERSONID = SR.SALESPERSONID
        INNER JOIN REGION R ON R.REGIONID = SR.REGIONID
        INNER JOIN SEGMENT S ON S.SEGMENTID = R.SEGMENTID
        INNER JOIN COUNTRY C ON C.COUNTRYID = R.COUNTRYID
        
    GROUP BY
        YEAR(SO.SALESORDERDATE),
        C.COUNTRYNAME,
        S.SEGMENTNAME,
        SP.FIRSTNAME,
        SP.LASTNAME,
        P.PRODUCTNAME;
    

    Post your new plan after applying this to see if (and how) it worked out.

    PS: The reason I'm adding PRODUCTID to the index on SALESORDERLINEITEM too is because it then can fetch those values pre-sorted; MAYBE the optimizer can put that to good use when further joining to the PRODUCT table. Same with SALESORDERID on the SALESORDER index