Search code examples
sqlheidisql

Left outer join on 2 columns not running properly


a is a table of 5,000,000 records and a score. b is a table of 100,000 records and a (different) score. I want to output every record from table a as long as the same record is not in table b with a score equal to 1.

The following query I wrote has taken 15 hours to execute and is still running. Is it possible that it could take this long to do such a query or is there perhaps something wrong with my code?

SELECT a.c1, a.c2, a.score
INTO OUTFILE 'file.csv'
FROM a
LEFT OUTER JOIN b
ON a.c1=b.c1
AND a.c2=b.c2
WHERE b.score <> 1

Solution

  • Your query retrieves all records from table a that are also in table b with a score not equal to 1 (and it retrieves those records as many times as they appear in table b with a score other than 1). Which is different from "every record from table a as long as the same record is not in table b with a score equal to 1".

    SELECT a.c1, a.c2, a.score
    INTO OUTFILE 'file.csv'
    FROM a
    WHERE NOT EXISTS (
        SELECT 1 FROM b
        WHERE a.c1 = b.c1
        AND a.c2 = b.c2
        AND b.score = 1
    )
    

    The query above can take advantage of the following index

    create index my_idx on b(c1,c2,score)