Search code examples
performancet-sqlmergesql-merge

Approaches that would allow to make the MERGE opearation in SQL work faster


Could you recommed approaches that would allow to make the MERGE opearation in SQL to work faster?

I believe that the question is all about knowledge and experience and should not be considered as opinion based, since anything which would make the operation faster is definitely appropriate for the question and the faster the operation will become, the better the answer is.

In my particular case I have approximately 1.7 million records, which I am fetching in a recurring job and I use the records to update the existing records. In order to lock the real table (it is [LegalContractors]) as little as possible, I am using a temporary table (it is [LegalContractorTemps]) into which I add all the records from non SQL (but C#) code and after that I run the MERGE.

Here is what I am trying:

            DELETE FROM [dbo].[LegalContractorTemps] WHERE [Code] IS NULL;

            DELETE FROM [dbo].[LegalContractorTemps]
            WHERE [Id] IN (
                SELECT [Id]
                FROM [dbo].[LegalContractorTemps] [Temp]
                JOIN (
                    SELECT [Code], [Status], MAX([Id]) as [MaxId]
                    FROM [dbo].[LegalContractorTemps]
                    GROUP BY [Code], [Status]
                    HAVING COUNT([Id]) > 1
                ) [TempGroup]
                ON ([Temp].[Code] = [TempGroup].[Code] AND [Temp].[Status] = [TempGroup].[Status] AND [MaxId] != [Id])
            );

            CREATE UNIQUE INDEX [CodeStatus]
            ON [dbo].[LegalContractorTemps] ([Code], [Status]);

            SELECT GETDATE() AS [beginTime];

            MERGE [dbo].[LegalContractors] AS TblTarget
            USING [dbo].[LegalContractorTemps] AS TblSource
                ON (TblSource.[Code] = TblTarget.[Code] AND TblSource.[Status] = TblTarget.[Status])
            WHEN NOT MATCHED BY TARGET THEN 
                INSERT ([Code], [ShortName], [Name], [LegalAddress], [Status], [LastModified]) 
                VALUES (TblSource.[Code], TblSource.[ShortName], TblSource.[Name], TblSource.[LegalAddress], TblSource.[Status], GETDATE())
            WHEN MATCHED AND 
            (TblTarget.[ShortName] != TblSource.[ShortName] OR 
            TblTarget.[Name] != TblSource.[Name] OR
            TblTarget.[LegalAddress] != TblSource.[LegalAddress]) THEN 
                UPDATE SET 
                TblTarget.[ShortName] = TblSource.[ShortName],
                TblTarget.[Name] = TblSource.[Name],
                TblTarget.[LegalAddress] = TblSource.[LegalAddress],
                TblTarget.[LastModified] = GETDATE()
            WHEN NOT MATCHED BY SOURCE THEN 
                DELETE;

            SELECT GETDATE() AS [endTime];

            DROP INDEX [CodeStatus] ON [dbo].[LegalContractorTemps];

Right now the code shown above runs approximately 2 minutes.

I found this answer, but I was not able to apply it to my case, because I need the WHEN NOT MATCHED clause and I will have to perform a full scan anyway (whether or nor I will use the MERGE).


Solution

  • I would consider doing a modified flush and fill, rather than doing a MERGE at all.

    The method I've had the most success with uses partition switching. You build three identical tables; the main table that your users pull from, a staging table that you use for applying CRUD operations, and a holding table that you'll only use during the transition period after your updates.

    This will require a little re-tooling to shift your LastModified logic right into the CRUD operations you're performing during your updates.

    Then, after the staging table is ready for prime time, truncate yesterday's copy of the holding table. Next, switch the data from the main table to the now-empty holding table. Switch the data from staging to main. Probably wrap all of that in an explicit transaction.

    Boom. Your table is up-to-date. And you have a back up copy of yesterday's data in the holding table, just in case.

    Tons of additional detail in these articles:

    Comparison: Switching Tables vs. sp_rename

    Why You Should Switch in Staging Tables Instead of Renaming Them