Efficiently bulk import data in SSIS with occasional PK duplicate content?

Am regularly loading a flat file with 100k records into a table after some transformations. The table has a PK on two columns. The data on the whole does not contain duplicate PK information but occasionally, there are duplicates.

I naively didn't understand why SSIS was rejecting all my records when only some of them violated the PK constraint. I believe the problem is that during a bulk load, if even 1 of the rows violates the PK constraint, all rows in that batch get rejected.

If I alter the FastLoadMaxInsertCommitSize property of the OLE Db Destination to 1, if fixes the problem but it then runs like a dog as it's committing every 1 row.

In MySQL, the bulk load facility allows you to ignore PK errors and skip those rows without sacrificing performance. Does anyone know of a way to achieve this in SQL Server.

Any help much appreciated.

Solution

It sounds like you may be looking for IGNORE_DUP_KEY?

Using the IGNORE_DUP_KEY Option to Handle Duplicate Values

When you create or modify a unique index or constraint, you can set the IGNORE_DUP_KEY option ON or OFF. This option specifies the error response to duplicate key values in a multiple-row INSERT statement after the index has been created. When IGNORE_DUP_KEY is set to OFF (the default), the SQL Server Database Engine rejects all rows in the statement when one or more rows contain duplicate key values. When set to ON, only the rows that contain duplicate key values are rejected; the nonduplicate key values are added.

For example, if a single statement inserts 20 rows into a table with a unique index, and 10 of those rows contain duplicate key values, by default all 20 rows are rejected. However, if the index option IGNORE_DUP_KEY is set to ON, only the 10 duplicate key values will be rejected; the other 10 nonduplicate key values will be inserted into the table.