I'm having some performance issues due to full scans being run on some larger tables for a report. I've narrowed things down to this section of the query but can't figure out how to avoid the scans without changing the results.
To explain, we have a data archiving system that copies data from the live table to the archive table daily. The data is not removed from the live table until a period of time has passed. This results in a state where the live table and archive table will both have the same rows, but the data in the rows may not match.
This rules out a UNION query (which would eliminate the full scans). The requirements are for the report to show live data, so I also can't query just the archive table.
Any ideas? Here is the query. The primary keys of both tables is DetailIdent, but I do have an index on OrderIdent, as it's a foreign key back to the parent table. You can see that we take the main table results if they exist, otherwise we fall back to the archive data.
SELECT COALESCE(RegOD.OrderIdent, ArcOD.OrderIdent) AS OrderIdent,
COALESCE(RegOD.Quantity, ArcOD.Quantity) AS Quantity,
COALESCE(RegOD.LoadQuan, ArcOD.LoadQuan) AS LoadQuan,
COALESCE(RegOD.ShipQuan, ArcOD.ShipQuan) AS ShipQuan,
COALESCE(RegOD.RcvdQuan, ArcOD.RcvdQuan) AS RcvdQuan,
COALESCE(RegOD.UOM, ArcOD.UOM) AS UOM,
COALESCE(RegOD.SkidType, ArcOD.SkidType) AS SkidType,
COALESCE(RegOD.Product, ArcOD.Product) AS Product,
COALESCE(RegOD.PkgCode, ArcOD.PkgCode) AS PkgCode
FROM OrderDetail RegOD
FULL JOIN dbo.ArcOrderDtl ArcOD
ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE COALESCE(RegOD.OrderIdent, ArcOD.OrderIdent) = 717010
The filtering predicate COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) = 717010
is killing performance and it's forcing the engine to perform a full scan first, and filter data later.
Rephrase the COALESCE()
function and let the engine do its work. With a bit of luck the engine will be smart enough to find the optimization. In this case the query can take the form:
SELECT
COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) AS OrderIdent,
COALESCE(RegOD.Quantity,ArcOD.Quantity) AS Quantity,
COALESCE(RegOD.LoadQuan,ArcOD.LoadQuan) AS LoadQuan,
COALESCE(RegOD.ShipQuan,ArcOD.ShipQuan) AS ShipQuan,
COALESCE(RegOD.RcvdQuan,ArcOD.RcvdQuan) AS RcvdQuan,
COALESCE(RegOD.UOM,ArcOD.UOM) AS UOM,
COALESCE(RegOD.SkidType,ArcOD.SkidType) AS SkidType,
COALESCE(RegOD.Product,ArcOD.Product) AS Product,
COALESCE(RegOD.PkgCode,ArcOD.PkgCode) AS PkgCode
FROM OrderDetail RegOD
FULL JOIN dbo.ArcOrderDtl ArcOD ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE RegOD.OrderIdent = 717010 or ArcOD.OrderIdent = 717010
If the engine doesn't optimize Option #1 above, you can still try combining a left join with a right anti-join instead of writing a full join (they are equivalent). It's definitively more verbose, but in this case it clearly shows the engine what to do. This query could look like:
SELECT -- left join here
COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) AS OrderIdent,
COALESCE(RegOD.Quantity,ArcOD.Quantity) AS Quantity,
COALESCE(RegOD.LoadQuan,ArcOD.LoadQuan) AS LoadQuan,
COALESCE(RegOD.ShipQuan,ArcOD.ShipQuan) AS ShipQuan,
COALESCE(RegOD.RcvdQuan,ArcOD.RcvdQuan) AS RcvdQuan,
COALESCE(RegOD.UOM,ArcOD.UOM) AS UOM,
COALESCE(RegOD.SkidType,ArcOD.SkidType) AS SkidType,
COALESCE(RegOD.Product,ArcOD.Product) AS Product,
COALESCE(RegOD.PkgCode,ArcOD.PkgCode) AS PkgCode
FROM OrderDetail RegOD
LEFT JOIN dbo.ArcOrderDtl ArcOD ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE RegOD.OrderIdent = 717010
UNION ALL
SELECT -- right anti-join here
OrderIdent,
Quantity,
LoadQuan,
ShipQuan,
RcvdQuan,
UOM,
SkidType,
Product,
PkgCode
FROM dbo.ArcOrderDtl ArcOD
LEFT JOIN OrderDetail RegOD ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE ArcOD.OrderIdent = 717010 and RegOD.DetailIdent IS NULL