Could you recommed approaches that would allow to make the MERGE opearation in SQL to work faster?
I believe that the question is all about knowledge and experience and should not be considered as opinion based, since anything which would make the operation faster is definitely appropriate for the question and the faster the operation will become, the better the answer is.
In my particular case I have approximately 1.7 million records, which I am fetching in a recurring job and I use the records to update the existing records. In order to lock the real table (it is [LegalContractors]
) as little as possible, I am using a temporary table (it is [LegalContractorTemps]
) into which I add all the records from non SQL (but C#) code and after that I run the MERGE
.
Here is what I am trying:
DELETE FROM [dbo].[LegalContractorTemps] WHERE [Code] IS NULL;
DELETE FROM [dbo].[LegalContractorTemps]
WHERE [Id] IN (
SELECT [Id]
FROM [dbo].[LegalContractorTemps] [Temp]
JOIN (
SELECT [Code], [Status], MAX([Id]) as [MaxId]
FROM [dbo].[LegalContractorTemps]
GROUP BY [Code], [Status]
HAVING COUNT([Id]) > 1
) [TempGroup]
ON ([Temp].[Code] = [TempGroup].[Code] AND [Temp].[Status] = [TempGroup].[Status] AND [MaxId] != [Id])
);
CREATE UNIQUE INDEX [CodeStatus]
ON [dbo].[LegalContractorTemps] ([Code], [Status]);
SELECT GETDATE() AS [beginTime];
MERGE [dbo].[LegalContractors] AS TblTarget
USING [dbo].[LegalContractorTemps] AS TblSource
ON (TblSource.[Code] = TblTarget.[Code] AND TblSource.[Status] = TblTarget.[Status])
WHEN NOT MATCHED BY TARGET THEN
INSERT ([Code], [ShortName], [Name], [LegalAddress], [Status], [LastModified])
VALUES (TblSource.[Code], TblSource.[ShortName], TblSource.[Name], TblSource.[LegalAddress], TblSource.[Status], GETDATE())
WHEN MATCHED AND
(TblTarget.[ShortName] != TblSource.[ShortName] OR
TblTarget.[Name] != TblSource.[Name] OR
TblTarget.[LegalAddress] != TblSource.[LegalAddress]) THEN
UPDATE SET
TblTarget.[ShortName] = TblSource.[ShortName],
TblTarget.[Name] = TblSource.[Name],
TblTarget.[LegalAddress] = TblSource.[LegalAddress],
TblTarget.[LastModified] = GETDATE()
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
SELECT GETDATE() AS [endTime];
DROP INDEX [CodeStatus] ON [dbo].[LegalContractorTemps];
Right now the code shown above runs approximately 2 minutes.
I found this answer, but I was not able to apply it to my case, because I need the WHEN NOT MATCHED
clause and I will have to perform a full scan anyway (whether or nor I will use the MERGE
).
I would consider doing a modified flush and fill, rather than doing a MERGE
at all.
The method I've had the most success with uses partition switching. You build three identical tables; the main table that your users pull from, a staging table that you use for applying CRUD
operations, and a holding table that you'll only use during the transition period after your updates.
This will require a little re-tooling to shift your LastModified
logic right into the CRUD
operations you're performing during your updates.
Then, after the staging table is ready for prime time, truncate yesterday's copy of the holding table. Next, switch the data from the main table to the now-empty holding table. Switch the data from staging to main. Probably wrap all of that in an explicit transaction.
Boom. Your table is up-to-date. And you have a back up copy of yesterday's data in the holding table, just in case.
Tons of additional detail in these articles:
Comparison: Switching Tables vs. sp_rename
Why You Should Switch in Staging Tables Instead of Renaming Them