Search code examples
sql-serverperformancet-sqlcursor

SQL Server Cursor performance issue


I need help with optimizing a cursor or changing the code completely. I have the below requirement:

Create column Sequence grouped by ColumnA, ColumnD and GroupA. StartA is used for sorting. Have tried using LAG, Row_Number, etc with no joy since the grouping sequence restart on change of Column D taking into account ColumnA (that can repeat) and GroupA sorted by StartA.

The code below works fine for a small set of records but last time I run it took over 3 hours and did not complete so I have killed the job. The table has over 700,000 records. Looking for any tips on how to improve this. Thank you! Sample result using DENSE_RANK:

enter image description here

DECLARE 
        @ColumnA VARCHAR(10),
        @StartA DATETIME,
        @ColumnD VARCHAR(50),
        @Sequence INTEGER,
        @Sequence_Calc INTEGER = 1,
        @Previous_ColumnA VARCHAR(10),
        @Previous_ColumnD VARCHAR(50)

SELECT *
    INTO #Temp_Table
    FROM TABLEA
    ORDER BY ColumnA,
        ColumnD

DECLARE Seq_Cursor CURSOR 

    FOR SELECT  ColumnA,
             StartA,
             ColumnD,
             Sequence
      FROM #Temp_Table
      ORDER BY ColumnA,
        ColumnD

FOR UPDATE OF Sequence

OPEN Seq_Cursor

    FETCH NEXT FROM Seq_Cursor
        INTO    @ColumnA, @StartA, @ColumnD, @Sequence 

WHILE @@FETCH_STATUS= 0
BEGIN
    BEGIN
        UPDATE #Temp_Table
        SET Sequence = @Sequence_Calc
        WHERE ColumnD = @ColumnD
        AND StartA = @StartA
        AND ColumnA = @ColumnA

        SET @Previous_ColumnA = @ColumnA
        SET @Previous_ColumnD = @ColumnD
    END

    FETCH NEXT FROM Seq_Cursor
    INTO     @ColumnA, @StartA, @ColumnD, @Sequence
    
    BEGIN 
         SELECT @Sequence_Calc = CASE WHEN @Previous_ColumnD = @ColumnD THEN 
                                 CASE WHEN @Previous_ColumnA <> @ColumnA THEN @Sequence_Calc + 1 ELSE @Sequence_Calc END 
                                 ELSE 1 END 
    END
END

CLOSE Seq_Cursor
DEALLOCATE Seq_Cursor

Solution

  • Not sure why you're messing around with cursors, they are slow and inefficient, complex to write and complex to understand.

    It's really hard to tell without a fuller explanation of the desired logic, but it seems it's a Gaps-and-Islands problem.

    You need to use

    • LAG to mark the rows that are the start of a new group
    • Then use a windowed conditional COUNT to create a group ID
    • Then use ROW_NUMBER partitioned by that ID.
    WITH StartValues AS (
        SELECT *,
          CASE WHEN
              ColumnA = LAG(ColumnA) OVER (PARTITION BY ColumnD ORDER BY StartA)
              AND GroupA = LAG(GroupA) OVER (PARTITION BY ColumnD ORDER BY StartA)
            THEN NULL ELSE 1 END AS IsStart
        FROM TABLEA a
    ),
    Grouped AS (
        SELECT *,
          COUNT(IsStart) OVER (PARTITION BY ColumnD ORDER BY StartA) AS GroupID
        FROM Grouped
    )
    SELECT *,
      ROW_NUMBER() OVER (PARTITION BY ColumnD, GroupID ORDER BY StartA) AS Sequence
    FROM Grouped;