Search code examples
sql-servert-sqlmergenullintersect

T-SQL MERGE won't handle NULLs as expected


If this query:

SELECT CONCAT(SOURCE.OrderNo, '_', SOURCE.OrderLine), 
       SOURCE.ProdOrder, 
       SOURCE.Lvl1, 
       SOURCE.Lvl2, 
       SOURCE.Lvl3, 
       SOURCE.LastDate 
FROM   dbo.SourceTbl AS SOURCE 

returns 11 records and this query:

SELECT CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine), 
       TARGET.ProdOrder, 
       TARGET.Lvl1, 
       TARGET.Lvl2, 
       TARGET.Lvl3, 
       TARGET.LastDate 
FROM   dbo.TargetTbl AS TARGET 

returns 17 records and the INTERSECT between the two:

SELECT CONCAT(SOURCE.OrderNo, '_', SOURCE.OrderLine), 
       SOURCE.ProdOrder, 
       SOURCE.Lvl1, 
       SOURCE.Lvl2, 
       SOURCE.Lvl3, 
       SOURCE.LastDate 
FROM   dbo.SourceTbl AS SOURCE 
INTERSECT 
SELECT CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine), 
       TARGET.ProdOrder, 
       TARGET.Lvl1, 
       TARGET.Lvl2, 
       TARGET.Lvl3, 
       TARGET.LastDate 
FROM   dbo.TargetTbl AS TARGET 

returns 9 records, when I do a MERGE like this:

MERGE dbo.TargetTbl AS TARGET
USING (
       SELECT   OrderNo, OrderLine, CONCAT(OrderNo, '_', OrderLine) AS OrderNoLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3,
                MAX(LastDate) AS LastDate
       FROM dbo.SourceTbl
       GROUP BY OrderNo, OrderLine, CONCAT(OrderNo, '_', OrderLine), SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3
      ) AS SOURCE 
      ON CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine) = OrderNoLine 
         AND TARGET.ProdOrder = SOURCE.ProdOrder
         AND TARGET.Lvl1 = SOURCE.Lvl1
         AND TARGET.Lvl2 = SOURCE.Lvl2
         AND TARGET.Lvl3 = SOURCE.Lvl3  
         AND TARGET.LastDate = SOURCE.LastDate
WHEN MATCHED AND EXISTS (SELECT CONCAT(SOURCE.OrderNo, '_', SOURCE.OrderLine)
                               ,SOURCE.ProdOrder
                               ,SOURCE.Lvl1
                               ,SOURCE.Lvl2
                               ,SOURCE.Lvl3 
                               ,SOURCE.LastDate
                         INTERSECT 
                         SELECT CONCAT(TARGET.OrderNo, '_', TARGET.OrderLine)
                               ,TARGET.ProdOrder
                               ,TARGET.Lvl1
                               ,TARGET.Lvl2
                               ,TARGET.Lvl3
                               ,TARGET.LastDate
                        )
THEN UPDATE SET TARGET.IsBlocked = 1, TARGET.BlockDate = GETDATE()
WHEN NOT MATCHED BY TARGET 
THEN INSERT (LastDate, UsrID, DepID, OrderNo, OrderLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, IsBlocked, BlockDate)
     VALUES (SOURCE.LastDate, 999, 999, SOURCE.OrderNo, SOURCE.OrderLine, SOURCE.SomeModel, SOURCE.ProdOrder, SOURCE.Lvl1, SOURCE.Lvl2, SOURCE.Lvl3, 1, GETDATE());

it should, according to this and this, UPDATE the 9 INTERSECT records of the TargetTbl and INSERT to that same table the remaining 2 records from the SourceTbl (11 in total). Instead it updates 4 records and inserts 6 records (10 in total). Two records in the SourceTbl are duplicated and that's the reason of 10 instead of 11 and that's also why I used the MAX & GROUP BY.

I think it's the first part of the query, the USING part, that cannot handle NULLs correctly even if the INTERSECT part does its job. I tried everything I was able to, but no success. I'm sure it's something easily doable, so please, help me. Thank you.

EDIT: SourceTbl data by using SELECT OrderNo, OrderLine, CONCAT(OrderNo, '_', OrderLine) AS OrderNoLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, LastDate AS LastDate FROM dbo.SourceTbl ORDER BY OrderNo, OrderLine, SomeModel, ProdOrder, irrelevant columns are omitted:

OrderNo OrderLine   OrderNoLine SomeModel   ProdOrder   Lvl1    Lvl2    Lvl3    LastDate
123c08637   10  123c08637_10    4321525175_004321   A5C008837   Abcd    Efgh    Olol    04/03/2030
123c11214   10  123c11214_10    4321532622_000391   NULL    NULL    NULL    NULL    07/07/2018
123c13039   10  123c13039_10    4321525175_002611   A5C014838   NULL    NULL    NULL    18/05/2018
123c16059   10  123c16059_10    4321541488_001111   A5C018611   NULL    NULL    NULL    18/05/2018
123c17482   10  123c17482_10    4321506480_001711   A5C019227   Asdf    Ghjk    Cvnm    12/12/2018
123c17482   10  123c17482_10    4321506480_001711   A5C047712   Asdf    Ghjk    Cvnm    12/12/2018
123c17482   20  123c17482_20    4321506480_001712   A5B072554   aaaa    bbbb    cccc    18/05/2018
123c17482   20  123c17482_20    4321506480_001712   A5B072554   aaaa    bbbb    cccc    18/05/2018
123c17482   20  123c17482_20    4321506480_001712   A5B072554   aaaa    bbbb    xxxx    18/05/2018
123c17482   20  123c17482_20    4321506480_001712   A5B200472   NULL    NULL    NULL    18/05/2018
123c32405   10  123c32405_10    8765525667_005301   NULL    Qwer    Uiop    Tygh    12/12/2018

Solution

  • The GROUP BY might reduce the number of records to only one (if the 11 records only differ in the LastDate column and if SomeModel contains the same value for all 11 records) or it might result in all 11 records (if SomeModel contains unique values), so that GROUP BY does not neccessarily resturn the 10 distinct rows. To achieve this, use SELECT DISTINCT instead of grouping by a subset of the columns.

    Also, if the ON condition worked as you seem to expect it, the additional EXISTS condition was obsolete. Obviously, 4 matches are found and 6 records have no match. Within these 6, there could be 2 records that indeed have no match and 4 records that do not match because of NULL values.

    To take care of NULL values, I suggest to change the whole statement to something like this:

    MERGE dbo.TargetTbl AS TARGET
    USING (
           SELECT DISTINCT OrderNo, OrderLine, ProdOrder, Lvl1, Lvl2, Lvl3, LastDate
           FROM dbo.SourceTbl
          ) AS SOURCE 
          ON     (TARGET.OrderNo = SOURCE.OrderNo OR TARGET.OrderNo IS NULL AND SOURCE.OrderNo IS NULL)
             AND (TARGET.OrderLine = SOURCE.OrderLine OR TARGET.OrderLine IS NULL AND SOURCE.OrderLine IS NULL)
             AND (TARGET.ProdOrder = SOURCE.ProdOrder OR TARGET.ProdOrder IS NULL AND SOURCE.ProdOrder IS NULL)
             AND (TARGET.Lvl1 = SOURCE.Lvl1 OR TARGET.Lvl1 IS NULL AND SOURCE.Lvl1 IS NULL)
             AND (TARGET.Lvl2 = SOURCE.Lvl2 OR TARGET.Lvl2 IS NULL AND SOURCE.Lvl2 IS NULL)
             AND (TARGET.Lvl3 = SOURCE.Lvl3 OR TARGET.Lvl3 IS NULL AND SOURCE.Lvl3 IS NULL)
             AND (TARGET.LastDate = SOURCE.LastDate OR TARGET.LastDate IS NULL AND SOURCE.LastDate IS NULL)
    WHEN MATCHED 
    THEN UPDATE SET TARGET.IsBlocked = 1, TARGET.BlockDate = GETDATE()
    WHEN NOT MATCHED BY TARGET 
    THEN INSERT (LastDate, UsrID, DepID, OrderNo, OrderLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, IsBlocked, BlockDate)
         VALUES (LastDate, 999, 999, OrderNo, OrderLine, SomeModel, ProdOrder, Lvl1, Lvl2, Lvl3, 1, GETDATE());