Search code examples
sqlsql-serverquery-optimizationsql-optimizationsql-merge

SQL Merge - How can I optimize this?


Table A (table to merge into) has 90,000 rows Table B (source table) has 3,677 rows I would expect this to merge really quick but it's taking 30 minutes (and counting). How can it be optimized to run faster?

ALTER PROCEDURE [dbo].[MergeAddressFromGraph] 
-- no params

AS BEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON;

-- first add fids to the MergeFundraiserNameAddress table instead of the temp table?

SELECT fundraiserid, ein
INTO #fids 
FROM  bb02_fundraiser

BEGIN TRAN;
MERGE BB02_FundraiserNameAddress AS T
USING
(    
    select f.fundraiserid,
           n.addresslines,
           n.town,
           n.county,
           n.postcode,
           n.country,
           n.fulladdress,
           n.ein
    from MergeFundraiserNameAddress n
         join bb02_fundraiser f
         on f.ein = n.ein and f.isdefault = 1
    group by n.ein,
             f.fundraiserid,
             n.addresslines,
             n.town,
             n.county,
             n.postcode,
             n.country,
             n.fulladdress

) AS S
ON (T.fundraiserid in( (select fundraiserid from #fids where ein = S.ein)) )

WHEN MATCHED
    THEN UPDATE
        SET    
              -- ADDRESS
              T.addresslines = S.addresslines
              ,T.town = S.town
              ,T.county = S.county
              ,T.postcode = S.postcode
              ,T.country = S.country
              ,T.fulladdress = S.fulladdress

;

DELETE FROM MergeFundraiserNameAddress

COMMIT TRAN;

drop table #fids

END

UPDATE I was able to improve the stored procedure which now runs in just a few seconds. I joined on the temp table instead of the bb02_fundraiser table and removed the subquery in the ON clause.

I realize now that the Merge is not necessary and I could have used an Update instead, but I'm ok with this right now because an INSERT may be needed soon in a refactor.

UPDATED STORED PROCEDURE BELOW IF OBJECT_ID('tempdb..#fids') IS NOT NULL DROP TABLE #fids

SELECT fundraiserid, ein
INTO #fids 
FROM  bb02_fundraiser
where isdefault = 1

BEGIN TRAN;
MERGE BB02_FundraiserNameAddress AS T
USING
(    
    select f.fundraiserid,
           n.addresslines,
           n.town,
           n.county,
           n.postcode,
           n.country,
           n.fulladdress,
           n.ein
    from MergeFundraiserNameAddress n
         join #fids f
         on f.ein = n.ein
    group by n.ein,
             f.fundraiserid,
             n.addresslines,
             n.town,
             n.county,
             n.postcode,
             n.country,
             n.fulladdress

) AS S
ON (T.fundraiserid = S.fundraiserid)

WHEN MATCHED
    THEN UPDATE
        SET    
              -- ADDRESS
              T.addresslines = S.addresslines
              ,T.town = S.town
              ,T.county = S.county
              ,T.postcode = S.postcode
              ,T.country = S.country
              ,T.fulladdress = S.fulladdress

;

DELETE FROM MergeFundraiserNameAddress

COMMIT TRAN;

IF OBJECT_ID('tempdb..#fids') IS NOT NULL
DROP TABLE #fids

Solution

  • See below if this statement alone does the job for you.

    UPDATE T
    SET    T.addresslines = n.addresslines
          ,T.town = n.town
          ,T.county = n.county
          ,T.postcode = n.postcode
          ,T.country = n.country
          ,T.fulladdress = n.fulladdress
        from MergeFundraiserNameAddress n join bb02_fundraiser f
        on f.ein = n.ein and f.isdefault = 1
        INNER JOIN  BB02_FundraiserNameAddress T
        ON T.fundraiserid = f.fundraiserid AND T.ein = f.ein
    group by n.ein,
                 f.fundraiserid,
                 n.addresslines,
                 n.town,
                 n.county,
                 n.postcode,
                 n.country,
                 n.fulladdress
    

    As other users has mentioned in your comments, why use MERGE statement when you're only updating records. MERGE statement is used when you are doing multiple operation such as UPDATE , DELETE and INSERT.

    Since you are only UPDATING records there is no need for merge statement.

    Reason For Slow Performance

    Since you are getting all the records in a Temp table and then joining it with other tables and not creating any indexes on that Temp table, The absence of any indexes will hurt the query performance.

    When you do a SELECT * INTO #TempTable FROM Some_Table it will bring all the data from Some_Table into a Temp table but not the indexes. you can see your self by running this simple query

    select * from tempdb.sys.indexes
    where object_id = (select object_id 
                       from tempdb.sys.objects 
                       where name LIKE '#TempTable%')