Search code examples
sql-servert-sqlinsertinsert-select

Avoid duplicate keys on INSERT INTO/SELECT


I am trying to do an Insert/Select but I am getting a duplicate key error.

INSERT INTO dbo.DESTINATION_TABLE
                (
                    DocNumber,
                    TenantId,
                    UserId,
                    OtherField
                )
                SELECT
                    po.DocNumber,
                    po.TenantId,
                    po.CreatedById,
                    po.OtherField
                FROM dbo.SOURCE_TABLE po
                WHERE
                    po.DeletedById IS NULL AND
                    NOT EXISTS(
                        SELECT * FROM dbo.DESTINATION_TABLE poa 
                        WHERE
                           poa.DocNumber = po.DocNumber AND
                           poa.TenantId = po.TenantId
                    ) 

DESTINATION_TABLE has a composite primary key of DocNumber and TenantId. DESTINATION_TABLE is empty at the time of running.

SOURCE_TABLE has a primary key of 'SourceTableId'.

But I keep getting an error

Violation of PRIMARY KEY constraint 'PK_dbo.DESTINATION_TABLE'. Cannot insert duplicate key in object 'dbo.DESTINATION_TABLE'. The duplicate key value is (DOC-99, some-tenant).

I have also tried

MERGE INTO DESTINATION_TABLE poa
USING (
    SELECT
        x.DocNumber,
        x.TenantId,
        x.CreatedById,
        x.OtherField
    FROM dbo.SOURCE_TABLE x
    WHERE x.DeletedById IS NULL
) po
ON (poa.TenantId = po.TenantId AND poa.DocNumber = po.DocNumber)
WHEN NOT MATCHED THEN INSERT (DocNumber, TenantId, UserId, OtherField)
VALUES(po.DocNumber, po.TenantId, po.CreatedById, po.OtherField);

But I get the exact same result.

How is this happening? Is it because it is checking for 'NOT EXISTS' before running the insert? How do I fix this?


Solution

  • It seems that the flaw here is your understanding of how SQL works. SQL is a set-based language, so it works with the data in sets. For the above, this means that you define the data you want to INSERT with your SELECT and then all the rows that you define are INSERTed. For your EXISTS this means that it only checks against rows that exist in the table prior to any of the rows being inserted, it does not insert each row one at a time and validate the EXISTS prior to each row.

    Let's take a very basic example with a single column table:

    CREATE TABLE dbo.SomeTable (ID int CONSTRAINT PK_SomeTable PRIMARY KEY);
    INSERT INTO dbo.SomeTable (ID)
    VALUES(1),(3);
    

    Now, let's say we want to insert the following dataset:

    ID
    1
    2
    2
    4

    If you performed the following query, this will generate the same error you had:

    INSERT INTO dbo.SomeTable (ID)
    SELECT V.ID
    FROM (VALUES(1),(2),(2),(4))V(ID)
    WHERE NOT EXISTS (SELECT 1
                      FROM dbo.SomeTable ST
                      WHERE ST.ID = V.ID);
    

    Violation of PRIMARY KEY constraint 'PK_SomeTable'. Cannot insert duplicate key in object 'dbo.SomeTable'. The duplicate key value is (2).

    This is because there are two rows with the value 2 for ID that are trying to be INSERTed. Both rows are trying to be INSERTed because there are no rows in the table with the value 2 at the time the INSERT occurs. If you want, you can validate what rows were tried to be INSERTed by commenting out the INSERT clause:

    --INSERT INTO dbo.SomeTable (ID)
    SELECT V.ID
    FROM (VALUES(1),(2),(2),(4))V(ID)
    WHERE NOT EXISTS (SELECT 1
                      FROM dbo.SomeTable ST
                      WHERE ST.ID = V.ID);
    
    ID
    2
    2
    4

    For your scenario, one method would be to use a CTE with ROW_NUMBER to limit the results to a single row for a single primary key value.

    WITH CTE AS
        (SELECT po.DocNumber,
                po.TenantId,
                po.CreatedById,
                po.OtherField,
                ROW_NUMBER() OVER (PARTITION BY po.DocNumber, po.TenantId ORDER BY <Column(s) to ORDER BY to determine desired row>) AS RN
         FROM dbo.SOURCE_TABLE po
         WHERE po.DeletedById IS NULL
           AND NOT EXISTS (SELECT *
                           FROM dbo.DESTINATION_TABLE poa
                           WHERE poa.DocNumber = po.DocNumber
                             AND poa.TenantId = po.TenantId))
    INSERT INTO dbo.DESTINATION_TABLE (DocNumber,
                                       TenantId,
                                       UserId,
                                       OtherField)
    SELECT DocNumber,
           TenantId,
           CreatedById,
           OtherField
    FROM CTE
    WHERE RN = 1;