Left outer join on 2 columns not running properly

a is a table of 5,000,000 records and a score. b is a table of 100,000 records and a (different) score. I want to output every record from table a as long as the same record is not in table b with a score equal to 1.

The following query I wrote has taken 15 hours to execute and is still running. Is it possible that it could take this long to do such a query or is there perhaps something wrong with my code?

SELECT a.c1, a.c2, a.score
INTO OUTFILE 'file.csv'
FROM a
LEFT OUTER JOIN b
ON a.c1=b.c1
AND a.c2=b.c2
WHERE b.score <> 1

Solution

Your query retrieves all records from table a that are also in table b with a score not equal to 1 (and it retrieves those records as many times as they appear in table b with a score other than 1). Which is different from "every record from table a as long as the same record is not in table b with a score equal to 1".

SELECT a.c1, a.c2, a.score
INTO OUTFILE 'file.csv'
FROM a
WHERE NOT EXISTS (
    SELECT 1 FROM b
    WHERE a.c1 = b.c1
    AND a.c2 = b.c2
    AND b.score = 1
)

The query above can take advantage of the following index

create index my_idx on b(c1,c2,score)