Search code examples
phpmysqlsqlsql-optimization

Optimizing MySQL Query with "NOT IN"


I've seen a few questions dabbling with the inefficiency of "NOT IN" in MySQL queries, but I didn't manage to reproduce the proposed solutions.

So I've got some sort of search engine. It starts with very simple queries, and then tries more complicated ones if it doesn't find enough results. Here's how it works in pseudocode

list_of_ids = do_simple_search()
nb_results = size_of(list_of_ids)

if nb_results < max_nb_results :
    list_of_ids .= do_search_where_id_not_in(list_of_ids)

    if nb_results < max_nb_results :
         list_of_ids .= do_complicated_search_where_id_not_in(list_of_ids)

Hope I'm clear. Anyway here's the slow query, as shown by MySQL-slow :

SELECT DISTINCT c.id 
FROM clients c LEFT JOIN communications co ON c.id = co.client_id 
WHERE (co.titre LIKE 'S' OR co.contenu LIKE 'S') AND c.id NOT IN(N)
LIMIT N, N

And here's an EXPLAIN on that query :

id  select_type     table   type    possible_keys               key         key_len ref             rows    Extra
1   SIMPLE          c       index   PRIMARY                     PRIMARY     2       NULL            25250   Using where; Using index; Using temporary
1   SIMPLE          co      ref     qui_com,id_client,titre     id_client   2       klients.c.id    8       Using where; Distinct

MySQL version is 5.1.63-0ubuntu0.11.04.1-log

Maybe my approach is wrong here ? How would you do it ? thanks.


Solution

  • A couple of remarks:

    1) Why do you do LEFT JOIN i/o (INNER) JOIN? LEFT JOIN means that you want to also get the records which are not matched agains clients, is that the intention? If not, then JOIN i/o LEFT JOIN is quicker.

    2) Why do you need JOIN at all if you can simply do:

    SELECT DISTINCT co.client_id from communications co 
    WHERE (co.titre LIKE 'S' OR co.contenu LIKE 'S') AND co.id!=N LIMIT N,N;
    

    Also, if you do a JOIN, both joined fields must be indexes, otherwise it's slow too.

    More importantly, you condition both client_id and id from communications table, but there is no common index for these both, which means more work to execute your query (hence using temporary which is in general not a good sign).

    3) You do a complex condition on both co.titre and co.contenu, you seem to have indexes but they are not used. That means this part might be potentially quite slow.