I am trying to optimize a SQL query in SSMS. When I execute it and review the execution plan, there is a clustered index scan on my table called SalesOrderLineItem
.
Below is my SQL query
SELECT
YEAR(SO.SALESORDERDATE) AS SALESYEAR,
C.COUNTRYNAME,
S.SEGMENTNAME,
SP.FIRSTNAME,
SP.LASTNAME,
P.PRODUCTNAME,
COUNT(*) AS TOTALPRODUCTSALES,
SUM(
CASE
WHEN SLI.PROMOTIONID = 0 THEN 0
ELSE 1
END
) AS TOTALPROMOTIONALSALES
FROM
SALESORDERLINEITEM SLI
INNER JOIN PRODUCT P ON P.PRODUCTID = SLI.PRODUCTID
INNER JOIN SALESORDER SO ON SO.SALESORDERID = SLI.SALESORDERID
INNER JOIN SALESREGION SR ON SR.SALESREGIONID = SO.SALESREGIONID
INNER JOIN SALESPERSON SP ON SP.SALESPERSONID = SR.SALESPERSONID
INNER JOIN REGION R ON R.REGIONID = SR.REGIONID
INNER JOIN SEGMENT S ON S.SEGMENTID = R.SEGMENTID
INNER JOIN COUNTRY C ON C.COUNTRYID = R.COUNTRYID
WHERE
YEAR(SO.SALESORDERDATE) > 2012
GROUP BY
YEAR(SO.SALESORDERDATE),
C.COUNTRYNAME,
S.SEGMENTNAME,
SP.FIRSTNAME,
SP.LASTNAME,
P.PRODUCTNAME;
The clustered index scan has a cost of 48% and a duration of around 0.389s.
I have tried the following and it reduced the cost to 38%, with a duration of around 0.211s, however I would like to design an index to remove the clustered index scan on SalesOrderLineItem.
CREATE INDEX IDX ON SALESORDERLINEITEM (PRODUCTID, PROMOTIONID, SALESORDERID);
GO
SELECT
YEAR(SO.SALESORDERDATE) AS SALESYEAR,
C.COUNTRYNAME,
S.SEGMENTNAME,
SP.FIRSTNAME,
SP.LASTNAME,
P.PRODUCTNAME,
COUNT(*) AS TOTALPRODUCTSALES,
SUM(
CASE
WHEN SLI.PROMOTIONID = 0 THEN 0
ELSE 1
END
) AS TOTALPROMOTIONALSALES
FROM
SALESORDERLINEITEM SLI
INNER JOIN PRODUCT P ON P.PRODUCTID = SLI.PRODUCTID
INNER JOIN SALESORDER SO ON SO.SALESORDERID = SLI.SALESORDERID
INNER JOIN SALESREGION SR ON SR.SALESREGIONID = SO.SALESREGIONID
INNER JOIN SALESPERSON SP ON SP.SALESPERSONID = SR.SALESPERSONID
INNER JOIN REGION R ON R.REGIONID = SR.REGIONID
INNER JOIN SEGMENT S ON S.SEGMENTID = R.SEGMENTID
INNER JOIN COUNTRY C ON C.COUNTRYID = R.COUNTRYID
WHERE
YEAR(SO.SALESORDERDATE) > 2012
GROUP BY
YEAR(SO.SALESORDERDATE),
C.COUNTRYNAME,
S.SEGMENTNAME,
SP.FIRSTNAME,
SP.LASTNAME,
P.PRODUCTNAME;
What do I need to do to remove this and decrease the overall cost/execution time?
There simply isn't enough info here to give a definitive answer; we don't know the layout of your tables nor the indexes available nor the amount and spread of data in there.
That said, as mentioned by Larnu, don't use WHERE function(field) = ...
, it makes it (virtually) impossible to optimally use indexes on said field
. Going by the plan provided you already did that.
From the plan it looks like you're having
SALESORDER
because of the filter on SO.SALESORDERDATE
and the fact you also need SO.SALESREGIONID
and SALESORDERID
on top of SO.SALESORDERDATE
from that table. Adding a covering index might help out here.SALESORDERLINEITEM
part of costly is because it's that big. The system has to go through near 125 million row in a rather non-optimal way, the fact that your query runs in sub-second is in fact pretty impressive if you think about it.SALESORDERLINEITEM
table by adding your index but the reason the index is chosen over the table is probably because it's 'slimmer' than the actual table, not so much because it's helping as 'an index'. I would suggest to rather put SALESORDERID
in front as then the system will more likely be able to first filter on the requested year in SALESORDER
and use this already limited list to only have to read the necessary records from SALESORDERLINEITEM
based on the corresponding SALESORDERID
values.So, try this:
CREATE INDEX idx_test ON SALESORDERLINEITEM (SALESORDERID, PRODUCTID) INCLUDE (PROMOTIONID);
CREATE INDEX idx_test ON SALESORDER (SALESORDERDATE, SALESORDERID) INCLUDE (SALESREGIONID);
SELECT
YEAR(SO.SALESORDERDATE) AS SALESYEAR,
C.COUNTRYNAME,
S.SEGMENTNAME,
SP.FIRSTNAME,
SP.LASTNAME,
P.PRODUCTNAME,
COUNT(*) AS TOTALPRODUCTSALES,
SUM(
CASE
WHEN SLI.PROMOTIONID = 0 THEN 0
ELSE 1
END
) AS TOTALPROMOTIONALSALES
FROM
SALESORDERLINEITEM SLI
INNER JOIN PRODUCT P ON P.PRODUCTID = SLI.PRODUCTID
INNER JOIN SALESORDER SO ON SO.SALESORDERID = SLI.SALESORDERID
AND SO.SALESORDERDATE >= Convert(datetime, '20120101')
INNER JOIN SALESREGION SR ON SR.SALESREGIONID = SO.SALESREGIONID
INNER JOIN SALESPERSON SP ON SP.SALESPERSONID = SR.SALESPERSONID
INNER JOIN REGION R ON R.REGIONID = SR.REGIONID
INNER JOIN SEGMENT S ON S.SEGMENTID = R.SEGMENTID
INNER JOIN COUNTRY C ON C.COUNTRYID = R.COUNTRYID
GROUP BY
YEAR(SO.SALESORDERDATE),
C.COUNTRYNAME,
S.SEGMENTNAME,
SP.FIRSTNAME,
SP.LASTNAME,
P.PRODUCTNAME;
Post your new plan after applying this to see if (and how) it worked out.
PS: The reason I'm adding PRODUCTID
to the index on SALESORDERLINEITEM
too is because it then can fetch those values pre-sorted; MAYBE the optimizer can put that to good use when further joining to the PRODUCT
table. Same with SALESORDERID
on the SALESORDER
index