When I execute the following query:
SELECT * FROM `table1`
INNER JOIN table2 ON table2.number = table1.number
I get the result within 2 seconds. There are about 6 millions records in table2
and 1 million records in table1
table2.number
and table1.number
are indexed.
Now I want to get a list of numbers that not exist. Like this:
SELECT * FROM `table1`
INNER JOIN table2 ON table2.number != table1.number
It take forever and still hanging.. How to fix?
Let's say your first INNER JOIN
returns 75% of the 1,000,000 rows in table1
. The second query does not return the 250,000 other rows as you think. Instead, it attempts to create a Cartesian product and remove the 750,000 matching rows. Thus it's trying to return (6,000,000-750,000)×(1,000,000-750,000) rows. That's a bulging 1.3125×1012 row result set.
You probably want this:
SELECT * FROM table1
LEFT JOIN table2 ON table2.number = table1.number
WHERE table2.number IS NULL
This returns rows in table1
not present in table2
.
You might also be interested in FULL OUTER JOIN
:
SELECT * FROM table1
FULL OUTER JOIN table2 ON table2.number = table1.number
WHERE table1.number IS NULL OR table2.number IS NULL
This returns rows in both tables that don't have a match on the other table.