I am currently stuck in a situation. The scenario is this. I have products who may be associated with multiple categories. The data structure is shown below:
Products Table:
product_id name
1 Lemon
2 Kiwis
3 Cheese
Product to Categories Table
product_id category_id
1 1
1 2
1 3
2 1
2 3
3 2
3 4
Category Table (not required in query however adding it here to help visualize what is happening)
category_id name
1 Fruit
2 Yellow
3 Round
4 Dairy
What I'm struggling with here is that originally I want to get all products that are in the fruit category (category id 1) but I also want to check if a fruit is yellow. Keep in mind that yellow will not be the only filter, sometimes I will want to return yellow and orange fruit, however since cheese is yellow I can't return it since it isn't a fruit. However to make things a bit easier I always know that I am going to look in the fruit category as a base.
The database structure can not change as its an opencart database structure.
Here are my attempts:
SELECT GROUP_CONCAT(DISTINCT p2c2.category_id SEPARATOR ',') as categories
FROM oc_product_to_category p2c
LEFT JOIN oc_product p ON (p.product_id = p2c.product_id)
LEFT JOIN oc_product_to_category p2c2 ON (p.product_id = p2c2.product_id)
WHERE p2c.category_id IN ('1','2')
This kind of works except for that fact that it will return Cheese.
Notes: I use Group Concat because at the end of all of this my goal is to return not so much the products that match these categories but based on the filters I want to return another list of categories from the products that match this criteria. So:
Scenario:
Get Products that match category criteria Return categories of those products.
Any assistance will be greatly appreciated.
This type of problem is called relational division.
There are two common solutions:
First solution strings together the matching categories and compares to a fixed string:
SELECT p2c.product_id
FROM oc_product_to_category p2c
GROUP BY p2c.product_id
HAVING GROUP_CONCAT(p2c.category_id SEPARATOR ',' ORDER BY p2c.category_id) = '1,2'
Second solution does a JOIN
for each required value:
SELECT p.product_id
FROM oc_product p
INNER JOIN oc_product_to_category p2c1
ON (p.product_id = p2c1.product_id AND p2c1.category_id = 1)
INNER JOIN oc_product_to_category p2c2
ON (p.product_id = p2c2.product_id AND p2c2.category_id = 2)
I cover these solutions in my presentation SQL Query Patterns, Optimized. I found in my tests that the join solution is much better for performance.
@Tom's suggestion is right, here's what that would look like in a complete query:
SELECT p.product_id, GROUP_CONCAT(p2c3.category_id SEPARATOR ',') AS categories
FROM oc_product p
INNER JOIN oc_product_to_category p2c1
ON (p.product_id = p2c1.product_id AND p2c1.category_id = 1)
INNER JOIN oc_product_to_category p2c2
ON (p.product_id = p2c2.product_id AND p2c2.category_id = 2)
INNER JOIN oc_product_to_category p2c3
ON (p.product_id = p2c3.product_id)
GROUP BY p.product_id;
The DISTINCT
that @Tom suggests shouldn't be necessary, because your p2c table should have a UNIQUE constraint over (product_id, category_id).