Search code examples
sqlsql-servert-sqlquery-optimization

SQL - Avoid full scans when joining archive tables


I'm having some performance issues due to full scans being run on some larger tables for a report. I've narrowed things down to this section of the query but can't figure out how to avoid the scans without changing the results.

To explain, we have a data archiving system that copies data from the live table to the archive table daily. The data is not removed from the live table until a period of time has passed. This results in a state where the live table and archive table will both have the same rows, but the data in the rows may not match.

This rules out a UNION query (which would eliminate the full scans). The requirements are for the report to show live data, so I also can't query just the archive table.

Any ideas? Here is the query. The primary keys of both tables is DetailIdent, but I do have an index on OrderIdent, as it's a foreign key back to the parent table. You can see that we take the main table results if they exist, otherwise we fall back to the archive data.

SELECT COALESCE(RegOD.OrderIdent, ArcOD.OrderIdent) AS OrderIdent,
                   COALESCE(RegOD.Quantity, ArcOD.Quantity) AS Quantity,
                   COALESCE(RegOD.LoadQuan, ArcOD.LoadQuan) AS LoadQuan,
                   COALESCE(RegOD.ShipQuan, ArcOD.ShipQuan) AS ShipQuan,
                   COALESCE(RegOD.RcvdQuan, ArcOD.RcvdQuan) AS RcvdQuan,
                   COALESCE(RegOD.UOM, ArcOD.UOM) AS UOM,
                   COALESCE(RegOD.SkidType, ArcOD.SkidType) AS SkidType,
                   COALESCE(RegOD.Product, ArcOD.Product) AS Product,
                   COALESCE(RegOD.PkgCode, ArcOD.PkgCode) AS PkgCode
            FROM OrderDetail RegOD
                FULL JOIN dbo.ArcOrderDtl ArcOD
                    ON ArcOD.DetailIdent = RegOD.DetailIdent
                    WHERE COALESCE(RegOD.OrderIdent, ArcOD.OrderIdent) = 717010

execution plan showing two clustered index full scans


Solution

  • The filtering predicate COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) = 717010 is killing performance and it's forcing the engine to perform a full scan first, and filter data later.

    Option 1 - Rephrase the COALESCE() function

    Rephrase the COALESCE() function and let the engine do its work. With a bit of luck the engine will be smart enough to find the optimization. In this case the query can take the form:

    SELECT
      COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) AS OrderIdent,
      COALESCE(RegOD.Quantity,ArcOD.Quantity) AS Quantity,
      COALESCE(RegOD.LoadQuan,ArcOD.LoadQuan) AS LoadQuan,
      COALESCE(RegOD.ShipQuan,ArcOD.ShipQuan) AS ShipQuan,
      COALESCE(RegOD.RcvdQuan,ArcOD.RcvdQuan) AS RcvdQuan,
      COALESCE(RegOD.UOM,ArcOD.UOM) AS UOM,
      COALESCE(RegOD.SkidType,ArcOD.SkidType) AS SkidType,
      COALESCE(RegOD.Product,ArcOD.Product) AS Product,
      COALESCE(RegOD.PkgCode,ArcOD.PkgCode) AS PkgCode
    FROM OrderDetail RegOD 
    FULL JOIN dbo.ArcOrderDtl ArcOD ON ArcOD.DetailIdent = RegOD.DetailIdent
    WHERE RegOD.OrderIdent = 717010 or ArcOD.OrderIdent = 717010
    

    Option 2 - Combine a left join with a right anti-join instead of using a full join

    If the engine doesn't optimize Option #1 above, you can still try combining a left join with a right anti-join instead of writing a full join (they are equivalent). It's definitively more verbose, but in this case it clearly shows the engine what to do. This query could look like:

    SELECT -- left join here
      COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) AS OrderIdent,
      COALESCE(RegOD.Quantity,ArcOD.Quantity) AS Quantity,
      COALESCE(RegOD.LoadQuan,ArcOD.LoadQuan) AS LoadQuan,
      COALESCE(RegOD.ShipQuan,ArcOD.ShipQuan) AS ShipQuan,
      COALESCE(RegOD.RcvdQuan,ArcOD.RcvdQuan) AS RcvdQuan,
      COALESCE(RegOD.UOM,ArcOD.UOM) AS UOM,
      COALESCE(RegOD.SkidType,ArcOD.SkidType) AS SkidType,
      COALESCE(RegOD.Product,ArcOD.Product) AS Product,
      COALESCE(RegOD.PkgCode,ArcOD.PkgCode) AS PkgCode
    FROM OrderDetail RegOD 
    LEFT JOIN dbo.ArcOrderDtl ArcOD ON ArcOD.DetailIdent = RegOD.DetailIdent
    WHERE RegOD.OrderIdent = 717010
    UNION ALL
    SELECT -- right anti-join here
      OrderIdent,
      Quantity,
      LoadQuan,
      ShipQuan,
      RcvdQuan,
      UOM,
      SkidType,
      Product,
      PkgCode
    FROM dbo.ArcOrderDtl ArcOD
    LEFT JOIN OrderDetail RegOD ON ArcOD.DetailIdent = RegOD.DetailIdent
    WHERE ArcOD.OrderIdent = 717010 and RegOD.DetailIdent IS NULL