Basically, I want to copy table DET to table DET_NEW. I know that DET has some rows duplicated so I want to do it inserting duplicated rows just once. As you can see, I want to copy these rows by periods using HDR table.
Edit to add info: DET has no primary keys (that's why it has duplicated rows). DET_NEW has a composite primary key with FK1 and FK2.
PROBLEM:
I know that this is not a new topic. I've tried all the solutions I've found and SQL Server returns an error about primary key constraint violation ("cannot insert duplicate key"). The queries were executed explicitly just with one core.
I saw that MySQL has INSERT IGNORE
but I think there is nothing similar in SQL Server.
My solutions are the following ones:
Query 1:
INSERT INTO DET_NEW (FK1, FK2, value)
SELECT DISTINCT D.FK1, D.FK2, D.value
FROM HDR AS H, DET AS D
WHERE H.Date >= CONVERT( datetime, '2015-01-01 00:00:00', 20 )
AND H.Date < CONVERT( datetime, '2016-01-01 00:00:00', 20 )
AND H.FK1 = D.FK1
Query 2:
INSERT INTO DET_NEW (FK1, FK2, value)
SELECT D.FK1, D.FK2, D.value
FROM
(SELECT DISTINCT D.FK1, D.FK2, D.value
FROM HDR AS H, DET AS D
WHERE H.Date >= CONVERT( datetime, '2015-01-01 00:00:00', 20)
AND H.Date < CONVERT( datetime, '2016-01-01 00:00:00', 20 )
AND H.FK1 = D.FK1) D
Query 3:
INSERT INTO DET_NEW (FK1, FK2, value)
SELECT D.FK1, D.FK2, D.value
FROM HDR AS H, DET AS D
WHERE H.Date >= CONVERT(datetime, '2018-02-01 00:00:00', 20)
AND H.Date < CONVERT(datetime, '2018-02-05 00:00:00', 20)
AND H.FK1 = D.FK1
GROUP BY D.FK1, D.FK2
Query 4:
WITH cte AS
(
SELECT
D.FK1, D.FK2, D.value,
ROW_NUMBER() OVER (PARTITION BY D.FK1, D.FK2, D.value ORDER BY D.FK1) AS [rn]
FROM
HDR AS H, DET AS D
WHERE
H.Date >= CONVERT(datetime, '2018-02-01 00:00:00', 20)
AND H.Date < CONVERT(datetime, '2018-02-03 00:00:00', 20)
AND H.FK1 = D.FK1
)
INSERT INTO DET_NEW (FK1, FK2, value)
SELECT cte.FK1, cte.FK2, cte.value
FROM cte
WHERE cte.[rn] = 1
ADDITIONAL INFO
I executed the following query: in the external part, the code checks if there is any duplication; on the inside, there is a SELECT DISTINCT
, the same as Query 2. The results are more than 1k rows, so I've something incorrect in the queries.
SELECT D.FK1, D.FK2, COUNT(D.FK1) AS count
FROM (
SELECT DISTINCT D.FK1, D.FK2, D.value
FROM HDR AS H,
DET AS D
WHERE H.Date >= CONVERT( datetime, '2018-02-01 00:00:00', 20 )
and H.Date < CONVERT( datetime, '2018-02-03 00:00:00', 20 )
and H.FK1 = D.FK1
)D
GROUP BY D.FK1, D.FK2
HAVING
COUNT(*) > 1
The questions are: why am I inserting duplicates? Is there another efficient way for a large database (billions of rows)?
It's likely that for some combinations of FK1
and FK2
, there is more than one different value
in DET
. The following query should demonstrate that
SELECT FK1, FK2, MIN(value), MAX(value), COUNT(DISTINCT value)
FROM DET
GROUP BY FK1, FK2
HAVING COUNT(DISTINCT value) > 1
You'll either need to include value in the composite key in DET_NEW
or work out how you want to select which value to use for each key, e.g. using a MIN
or MAX
. The query that @MoinulIslam provided will also help you select a single value for each key. In that query it is just choosing the first one.