Search code examples
sqlmysqlperformancequery-optimization

Is there any alternative for WHERE NOT IN(..)?


I have the following query:

SELECT *
FROM customers_cards
WHERE id NOT IN ( SELECT card_id FROM customers_card_psp)

It takes about 2sec to get executed and we need this query to get called on a highly visited page. Any idea how can I make it more optimize?

Here is the result of EXPLAIN:

enter image description here

Noted that card_id is not unique (there are 5 rows inside customers_card_psp per 1 row inside customers_cards), but adding DISTINCT in the behind of card_id doesn't make anything better.


Noted that, using LEFT JOIN has the same performance:

SELECT cc.*
FROM customers_cards cc
LEFT JOIN customers_card_psp ccp ON ccp.card_id = cc.id
WHERE ccp.id IS null

enter image description here


Solution

  • These are 3 ways to say the same thing:

    SELECT *
        FROM customers_cards
        WHERE id NOT IN (
            SELECT card_id FROM customers_card_psp )
    

    vs

    SELECT *
        FROM customers_cards AS cc
        WHERE NOT EXISTS (
            SELECT 1               -- anything works here; "1" is the convention
                FROM customers_card_psp
                WHERE card_id = cc.id )
    

    vs

    SELECT cc.*
        FROM customers_cards AS cc
        LEFT JOIN customers_card_psp AS psp
               ON psp.card_id = cc.id       -- How the tables 'relate'
        WHERE psp.card_id IS NULL           -- Means "missing from psp"
    

    The NOT IN ( SELECT ... ) is notoriously inefficient and should be avoided.

    The EXISTS and LEFT JOIN are probably compiled identically, hence equally good.

    The EXISTS is called "semi-join" meaning that it only needs to find 0 or 1 row, not all 5, in order to satisfy the test.

    Related: The comparisons are probably the same without the NOT (and changing to IS NULL). That is, do not use IN; use either of the others.

    (Different versions of MySQL/MariaDB have optimized these things differently.)