Search code examples
mysqlmany-to-manyinnodbsql-optimization

Efficient search in many-to-many related tables


I have two tables related many-to-many via third connecting table: products and categories. Each product can be in a several categories. It's a typical many-to-many realationship:

products
-------------
id
product_name


categories
-------------
id
category_name


products_to_categories
-------------
product_id
caregory_id

I want to allow users to do a search for the products, that are in some of the chosen categories and are not in other chosen categories at the same time.

Example: find all producst that are in categories "Computers" and "Software", but are not in categories "Games", "Programming" and "Education".

Here is the query I designed to do this:

SELECT product_name
FROM products
WHERE
    EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 1 AND product_id = products.id) 
    AND EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 2 AND product_id = products.id) 
    AND NOT EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 3 AND product_id = products.id)
    AND NOT EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 4 AND product_id = products.id) 
    AND NOT EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 5 AND product_id = products.id)
ORDER BY id

It works. But it's so incredibly slow, that I just can't use it in production. All idexes are in place, but this query results in 5 dependant subqueries and the tables are huge.

Is there a way to solve the same task without dependant subqueries or optimize this query in some other way?

UPDATE

Indexes are:

products: PRIMARY KEY (id)
categories: PRIMARY KEY (id)
products_to_categories: PRIMARY KEY (product_id, caregory_id)

All tables are InnoDB


Solution

  • Please post the tables' definitions (so the engine used and the indexes defined are shown).

    You could also post the execution plan of your query (using the EXPLAIN statement).

    You could also try rewriting the query in various ways. Here's one:

    SELECT p.product_name
    FROM products  AS p
      JOIN products_to_categories  AS pc1
        ON pc1.category_id = 1 
        AND pc1.product_id = p.id
      JOIN products_to_categories  AS pc2
        ON  pc2.category_id = 2 
        AND pc2.product_id = p.id
    WHERE
        NOT EXISTS 
        ( SELECT * 
          FROM products_to_categories  AS pc 
          WHERE pc.category_id IN (3, 4, 5)
            AND pc.product_id = p.id
        )
    

    Update: You don't have a (category_id, product_id) index. Try adding it.