Search code examples
sqldb2subquerywhere-clause

DB2 - find same rec between 2 big files


I have 2 files with same structure (20 fields) and I must find if there exist equal record. The problem is that this files have more that 40 millions recs.

So I have start a SELECT like:

SELECT * FROM TABLE_A
  WHERE FIELD1||FIELD2||FIELD3||FIELD4...etc
  IN (SELECT FIELD1||FIELD2||FIELD3||FIELD4...etc
        FROM TABLE_B) 

I have started more that 24 hours ago.

Can anybody tell me please if there is a more quick solution and how can I improved it please ?

Thanks all in advance

Denis


Solution

  • I would recommend exists:

    select *
    from table_a a
    where exists (
        select 1
        from table_b b
        where b.field1 = a.field1 and b.field2 = a.field2 and ...
    );
    

    The benefits are:

    • correctness: this checks for exact matches, while the method that concatenates the string may generate fake positives (not to mention conversion problems if some columns are not strings)

    • performance: the expression in the from clause of the subquery is SARGable; if you were to create an index on at least a subset of these 40 columns, the database probably could take advantage of it when executing the query

    Note: this does not properly handle null values; if you want to consider null values as equal, then you need more complicated expressions:

    where 
        (b.field1 = a.field1 or (b.field1 is null and a.field1 is null))
        and (b.field2 = a.field2 or (b.field2 is null and a.field2 is null))
        and ...