Search code examples
sqlsql-serverselectexists

Huge amount of data from a source table where not exists to destination table


I am having some trouble selecting data from source table to the destination table to find out which data from the source table that has not yet completely integrated to the destination table.

The source table is another DBMS, which in SQL Server we used a linked server to access that DBMS of the source table and the integration is pretty much just straight forward from column to column of the source to destination (No other calculation)

When I execute a select statement like this

SELECT A.*  
FROM [ORCLE_DB]..GROUP.TABLEA AS A 
WHERE NOT EXISTS (SELECT 1 FROM TABLEA as B WHERE A.ID = B.ID)

It took forever to select the data, and the amount of data is pretty huge, about 20 million rows of data.

Is there any other ways in order to select these rows, that may allow SELECT execution to be done efficiently and faster? Thank you so much, any ideas and advice will be so much appreciated


Solution

  • You are likely falling foul of a distributed query going 'N+1'. The heuristics are somewhat arcane in the way they penalise network speed. You can verify this using SQL Profiler.

    If so, you can fix by:

    1. creating a local temp table to house all the required data from the linked server remote table(s) [and apply any differing collation to temp table columns]
    2. insert the remote data into the local temp table
    3. join the local table with the temp table.