Search code examples
sqlsql-servergroup-bymergesum

How to merge 2 or more rows and SUM a column in an UPDATE without a primary key?


I have a table in SQL Server like this:

CTransaction CQuantity CItemCode CUOM
001 2 0011 PACK
001 5 0011 PACK
001 1 0011 PACK
002 3 0011 PACK
002 3 0011 PACK

Basically this table contains item details with corresponding CTransaction. I want to merge and update rows with the same CTransaction and CItemCode while summing up their CQuantity.

After the update, as a result it would become something like the table below...

CTransaction CQuantity CItemCode CUOM
001 8 0011 PACK
002 6 0011 PACK

I found a similar question in here, however the example includes a primary key. My table doesn't have a primary key and I have no idea how to do this. Is this even possible? What can I try next?

This would be easy if I was just going to SELECT the rows like this:

SELECT 
    CTransaction, SUM(CQuantity), CItemCode, CUOM 
FROM 
    Table 
GROUP BY 
    CTransaction, CItemCode, CUOM

But I don't want this. I want to update the table to actually remove the duplicates permanently.


Solution

  • Every table should have a primary key. In this case the primary key should presumably be CTransaction, CItemCode, CUOM and you should fix the process to upsert this ongoing rather than add new duplicate rows.

    It is possible to use MERGE for this though. One way is below (to update an arbitrary one per group and delete the rest).

    WITH Tgt AS
    (
    SELECT *, 
            SumCQuantity = SUM(CQuantity) OVER (PARTITION BY CTransaction, CItemCode, CUOM ),
            rn = ROW_NUMBER() OVER (PARTITION BY CTransaction, CItemCode, CUOM  ORDER BY CTransaction) 
    FROM YourTable
    ) 
    Merge Tgt with (HOLDLOCK)
    USING (VALUES(1)) src(rn) ON Tgt.rn = 1
    WHEN MATCHED AND CQuantity IS DISTINCT FROM SumCQuantity THEN 
        UPDATE SET CQuantity = SumCQuantity 
    WHEN NOT MATCHED BY SOURCE THEN 
        DELETE;
    

    Fiddle

    Or another version provided by @Charlieface in the comments (that may be more efficient as avoids the common sub expression spool by only calculating the running sum - which can be done in a more streaming fashion as no need to add the calculated value back in to previous rows)

    WITH target AS (
        SELECT *,
          SUM(t.CQuantity) OVER (PARTITION BY CTransaction, CItemCode ORDER BY CQuantity ROWS UNBOUNDED PRECEDING) AS TotalQty,
          LEAD(1) OVER (PARTITION BY CTransaction, CItemCode ORDER BY CQuantity) AS IsNotLast
        FROM YourTable t
    )
    MERGE target t
    USING (VALUES(1)) AS v(dummy)
    ON 1=1
    WHEN MATCHED AND IsNotLast = 1 THEN DELETE
    WHEN MATCHED AND CQuantity IS DISTINCT FROM TotalQty THEN
      UPDATE
        SET CQuantity = TotalQty;
    

    Fiddle 2