In my case an entity is a recipe and a tag an ingredient. I want to search by ingredient and return all recipes containing the ingredient and return all other ingredients as well. In my case ingredients can have synonyms.
First, here are my tables:
CREATE TABLE `ingredient` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
`synonym_group` int DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name_UNIQUE` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE `recipe` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(100) DEFAULT NULL,
`type` varchar(45) DEFAULT NULL,
`comment` varchar(200) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE `recipe_ingredient` (
`recipe_id` int NOT NULL,
`ingredient_id` int NOT NULL,
`alternative_ingredient_id` int DEFAULT NULL,
`is_optional` tinyint(1) DEFAULT '0',
PRIMARY KEY (`recipe_id`,`ingredient_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Here is the query I've come up with:
SELECT
r.name,
i_all.name
FROM
recipe r
JOIN (
SELECT DISTINCT
recipe_id
FROM
ingredient i
JOIN recipe_ingredient ri ON i.id = ri.ingredient_id
LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group
WHERE i.name = 'Asd' OR i_syn.name = 'Asd'
) i ON i.recipe_id = r.id
JOIN recipe_ingredient ri ON i.recipe_id = ri.recipe_id
JOIN ingredient i_all ON ri.ingredient_id = i_all.id
I feel like this query is very complicated for what I want to do and there is a much simpler way. Mainly it feels weird that I have to specify the search term twice in the query.
Right now, if I only filter by i.name = 'Asd'
I won't find any synonyms and if I only filter by i_syn.name = 'Asd'
I won't find anything if the search term has no synonyms. Removing the distinct
doesn't seem to change anything.
Is there just no better way or am I thinking too complicated?
Example data for ingredients:
id | name | synonym_group |
---|---|---|
1 | Bell Pepper | 1 |
2 | Capsicum | 1 |
3 | Egg | NULL |
4 | Sugar | NULL |
You say that -
Removing the
distinct
doesn't seem to change anything.
which suggests your test data is inadequate. The following illustrates the importance of DISTINCT
in your current query. Given your ingredients listed above and the below recipes -
id | name | type | comment |
---|---|---|---|
1 | recipe 1 | type | comment |
2 | recipe 2 | type | comment |
3 | recipe 3 | type | comment |
4 | recipe 4 | type | comment |
5 | recipe 5 | type | comment |
recipe_id | ingredient_id | alternative_ingredient_id | is_optional |
---|---|---|---|
1 | 1 | NULL | 0 |
1 | 3 | NULL | 0 |
2 | 1 | NULL | 0 |
2 | 4 | NULL | 0 |
3 | 2 | NULL | 0 |
3 | 3 | NULL | 0 |
3 | 4 | NULL | 0 |
4 | 2 | NULL | 0 |
4 | 4 | NULL | 0 |
5 | 1 | NULL | 0 |
5 | 3 | NULL | 0 |
5 | 4 | NULL | 0 |
Your inner query -
SELECT DISTINCT
recipe_id
FROM
ingredient i
JOIN recipe_ingredient ri ON i.id = ri.ingredient_id
LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group
WHERE i.name = 'Bell Pepper' OR i_syn.name = 'Bell Pepper';
returns -
recipe_id |
---|
1 |
2 |
3 |
4 |
5 |
but if you drop the DISTINCT
it returns -
recipe_id |
---|
1 |
1 |
2 |
2 |
3 |
4 |
5 |
5 |
If you drop DISTINCT
but add an additional join predicate to stop it LEFT JOINing to itself you get the desired result -
SELECT
recipe_id
FROM
ingredient i
JOIN recipe_ingredient ri ON i.id = ri.ingredient_id
LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group AND i.id <> i_syn.id
WHERE i.name = 'Bell Pepper' OR i_syn.name = 'Bell Pepper';
More important than the query is the need to have the inverse index on the junction table -
ALTER TABLE recipe_ingredient
ADD INDEX IDX_ingredient_recipe (ingredient_id, recipe_id);
Your current query can be improved by nesting the ingredients list in another derived table and modified to allow for searches based on multiple ingredients with a best match type ranking. This is a bit of a hybrid of your original query and the solution proposed by P.Salmon -
SELECT r.name, i.name
FROM (
SELECT recipe_id, COUNT(DISTINCT ri.ingredient_id) AS num_matched
FROM recipe_ingredient ri
JOIN (
SELECT COALESCE(i2.id, i1.id) AS ingredient_id
FROM ingredient i1
LEFT JOIN ingredient i2 ON i1.synonym_group = i2.synonym_group
WHERE i1.name IN ( 'capsicum', 'onions' )
) all_syn ON ri.ingredient_id = all_syn.ingredient_id
GROUP BY recipe_id
ORDER BY num_matched DESC
LIMIT 0, 10
) mr
JOIN recipe r ON mr.recipe_id = r.id
JOIN recipe_ingredient ri ON r.id = ri.recipe_id
JOIN ingredient i ON ri.ingredient_id = i.id
ORDER BY mr.num_matched DESC, r.name ASC;
Here's a db<>fiddle
This query does not rely on the synonym_group being the id of one of its members.
Note the GROUP BY... ORDER BY... and LIMIT clauses on the middle layer query. With my test dataset of 39,774 recipes, 6,702 ingredients and 428,249 the server-side execution time is 13-16ms
Here's a little db<>fiddle to illustrate how the innermost query works to get the ingredients list.