I have two tables related many-to-many via third connecting table: products and categories. Each product can be in a several categories. It's a typical many-to-many realationship:
products
-------------
id
product_name
categories
-------------
id
category_name
products_to_categories
-------------
product_id
caregory_id
I want to allow users to do a search for the products, that are in some of the chosen categories and are not in other chosen categories at the same time.
Example: find all producst that are in categories "Computers" and "Software", but are not in categories "Games", "Programming" and "Education".
Here is the query I designed to do this:
SELECT product_name
FROM products
WHERE
EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 1 AND product_id = products.id)
AND EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 2 AND product_id = products.id)
AND NOT EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 3 AND product_id = products.id)
AND NOT EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 4 AND product_id = products.id)
AND NOT EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 5 AND product_id = products.id)
ORDER BY id
It works. But it's so incredibly slow, that I just can't use it in production. All idexes are in place, but this query results in 5 dependant subqueries and the tables are huge.
Is there a way to solve the same task without dependant subqueries or optimize this query in some other way?
UPDATE
Indexes are:
products: PRIMARY KEY (id)
categories: PRIMARY KEY (id)
products_to_categories: PRIMARY KEY (product_id, caregory_id)
All tables are InnoDB
Please post the tables' definitions (so the engine used and the indexes defined are shown).
You could also post the execution plan of your query (using the EXPLAIN
statement).
You could also try rewriting the query in various ways. Here's one:
SELECT p.product_name
FROM products AS p
JOIN products_to_categories AS pc1
ON pc1.category_id = 1
AND pc1.product_id = p.id
JOIN products_to_categories AS pc2
ON pc2.category_id = 2
AND pc2.product_id = p.id
WHERE
NOT EXISTS
( SELECT *
FROM products_to_categories AS pc
WHERE pc.category_id IN (3, 4, 5)
AND pc.product_id = p.id
)
Update: You don't have a (category_id, product_id)
index. Try adding it.