Search code examples
sqlsql-serversql-server-2005common-table-expressiondatabase-performance

Convert CTE to better performing query


I have a CTE query that updates a single column in a table with approx 2.5 millions rows. I let the query run and it took about 16 hours!! How can I update this procedure so it will be faster? I read that SELECT INTO and create a new table should be a better way. I just dont know how to convert this CTE into a SELECT INTO.

WITH CubeWithRowNumber
AS (
    SELECT rownum = ROW_NUMBER() OVER (
            ORDER BY CustomerId,
                Period
            ),
        c.Period,
        c.CustomerId,
        c.PayDate,
        NS_Regular,
        NS_Single,
        NySales
    FROM Cube2 c
    )
UPDATE Cube2
SET MonthlySales = (
    SELECT 
       CASE 
         WHEN YEAR(cu.Period) = YEAR(cu.PayDate)
             THEN cu.NySales
         ELSE 
           CASE 
             WHEN prev.Period IS NULL 
               OR YEAR(cu.Period) <> YEAR(prev.Period)
                 THEN cu.NS_Regular + cu.NS_Single
             ELSE cu.NS_Regular + cu.NS_Single - prev.NS_Regular - prev.NS_Single
           END
         END AS Result
     FROM CubeWithRowNumber cu
     LEFT JOIN CubeWithRowNumber prev
         ON prev.rownum = cu.rownum - 1
             AND cu.CustomerId = prev.CustomerId
     WHERE cu.CustomerId = Cube2.CustomerId
         AND cu.Period = Cube2.Period)

Solution

  • It's possible to avoid calling the CTE twice by remodelling the query

    UPDATE Cube2 SET
      MonthlySales = CASE WHEN YEAR(cu.Period) = YEAR(cu.PayDate) 
                               THEN cu.NySales
                          WHEN YEAR(cu.Period) <> YEAR(COALESCE(prev.Period, 0)) 
                               THEN cu.NS_Regular + cu.NS_Single
                          ELSE cu.NS_Regular + cu.NS_Single
                             - prev.NS_Regular - prev.NS_Single
                     END
    FROM Cube2 cu
         CROSS APPLY (SELECT TOP 1 Period, NS_Regular, NS_Single 
                      FROM   cube2
                      WHERE  cu.CustomerId = cube2.CustomerId
                        AND  cu.Period > cube2.Period
                      ORDER BY Period Desc) prev;
    

    that can improve the performances, even more if supported by an index on CustomerId and Period, but introduce an ORDER BY that is somewhat costly, so you may want to check it on a reduced set of data.

    Another little trouble is that CROSS APPLY is like a INNER JOIN and the first period for every customer have not previous period. To fix that it's possible to change the CROSS APPLY to an OUTER APPLY that is like a LEFT JOIN, but that will destroy the perfomances, or we can create some values from nothing. Coalescing a grouping function can do that: if there is the row its values it will remain the same, if the subquery is empty the MAX (or MIN or AVG, your choice) will create a new row, as the MAX of a table with no rows is NULL.

    The updated UPDATE is:

    UPDATE Cube2 SET
      MonthlySales = CASE WHEN YEAR(cu.Period) = YEAR(cu.PayDate) 
                               THEN cu.NySales
                          WHEN YEAR(cu.Period) <> YEAR(COALESCE(prev.Period, 0)) 
                               THEN cu.NS_Regular + cu.NS_Single
                          ELSE cu.NS_Regular + cu.NS_Single
                             - prev.NS_Regular - prev.NS_Single
                     END
    FROM Cube2 cu
         CROSS APPLY (SELECT COALESCE(MAX(Period), 0) Period
                           , COALESCE(MAX(NS_Regular), 0) NS_Regular
                           , COALESCE(MAX(NS_Single), 0) NS_Single
                      FROM   (SELECT TOP 1 Period, NS_Regular, NS_Single 
                              FROM   cube2
                              WHERE  cu.CustomerId = cube2.CustomerId
                                AND  cu.Period > cube2.Period
                              ORDER BY Period Desc) a
                      ) prev;
    

    there is some extra work for the grouping, but hopefully not that much.

    Sometime converting the CASE logic to math operator can help to further improve performances but in addition to the fact that it doesn't work always the query become less readable.

    If you want to try it here is the version converted

    UPDATE Cube2 SET
      MonthlySales 
      = cu.NySales * (1 - CAST((YEAR(cu.Period) - YEAR(cu.PayDate)) as BIT))
      + (cu.NS_Regular + cu.NS_Single)
      * (0 + CAST(YEAR(cu.Period) - YEAR(COALESCE(prev.Period, 0)) as BIT))
      * (0 + CAST((YEAR(cu.Period) - YEAR(cu.PayDate)) as BIT))
      + (cu.NS_Regular + cu.NS_Single - prev.NS_Regular - prev.NS_Single)
      * (1 - CAST(YEAR(cu.Period) - YEAR(COALESCE(prev.Period, 0)) as BIT))
      * (0 + CAST((YEAR(cu.Period) - YEAR(cu.PayDate)) as BIT))
    FROM Cube2 cu
         CROSS APPLY (SELECT COALESCE(MAX(Period), 0) Period
                           , COALESCE(MAX(NS_Regular), 0) NS_Regular
                           , COALESCE(MAX(NS_Single), 0) NS_Single
                      FROM   (SELECT TOP 1 Period, NS_Regular, NS_Single 
                              FROM   cube2
                              WHERE  cu.CustomerId = cube2.CustomerId
                                AND  cu.Period > cube2.Period
                              ORDER BY Period Desc) a
                      ) prev;