Search entity by tag and return all other tags as well in MySQL

In my case an entity is a recipe and a tag an ingredient. I want to search by ingredient and return all recipes containing the ingredient and return all other ingredients as well. In my case ingredients can have synonyms.

First, here are my tables:

CREATE TABLE `ingredient` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(45) DEFAULT NULL,
  `synonym_group` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name_UNIQUE` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `recipe` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT NULL,
  `type` varchar(45) DEFAULT NULL,
  `comment` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `recipe_ingredient` (
  `recipe_id` int NOT NULL,
  `ingredient_id` int NOT NULL,
  `alternative_ingredient_id` int DEFAULT NULL,
  `is_optional` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`recipe_id`,`ingredient_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Here is the query I've come up with:

SELECT 
  r.name, 
  i_all.name

FROM
  recipe r
  
JOIN (
    SELECT DISTINCT 
        recipe_id 
    FROM 
        ingredient i 
        JOIN recipe_ingredient ri ON i.id = ri.ingredient_id 
        LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group 
    WHERE i.name = 'Asd' OR i_syn.name = 'Asd'
) i ON i.recipe_id = r.id 

JOIN recipe_ingredient ri ON i.recipe_id = ri.recipe_id 
JOIN ingredient i_all ON ri.ingredient_id = i_all.id

I feel like this query is very complicated for what I want to do and there is a much simpler way. Mainly it feels weird that I have to specify the search term twice in the query. Right now, if I only filter by i.name = 'Asd' I won't find any synonyms and if I only filter by i_syn.name = 'Asd' I won't find anything if the search term has no synonyms. Removing the distinct doesn't seem to change anything.

Is there just no better way or am I thinking too complicated?

Example data for ingredients:

id	name	synonym_group
1	Bell Pepper	1
2	Capsicum	1
3	Egg	NULL
4	Sugar	NULL

Solution

You say that -

Removing the distinct doesn't seem to change anything.

which suggests your test data is inadequate. The following illustrates the importance of DISTINCT in your current query. Given your ingredients listed above and the below recipes -

id	name	type	comment
1	recipe 1	type	comment
2	recipe 2	type	comment
3	recipe 3	type	comment
4	recipe 4	type	comment
5	recipe 5	type	comment

recipe_id	ingredient_id	alternative_ingredient_id
1	1	NULL
1	3	NULL
2	1	NULL
2	4	NULL
3	2	NULL
3	3	NULL
3	4	NULL
4	2	NULL
4	4	NULL
5	1	NULL
5	3	NULL
5	4	NULL

Your inner query -

SELECT DISTINCT 
    recipe_id 
FROM 
    ingredient i 
    JOIN recipe_ingredient ri ON i.id = ri.ingredient_id 
    LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group
WHERE i.name = 'Bell Pepper' OR i_syn.name = 'Bell Pepper';

returns -

recipe_id
1
2
3
4
5

but if you drop the DISTINCT it returns -

recipe_id
1
1
2
2
3
4
5
5

If you drop DISTINCT but add an additional join predicate to stop it LEFT JOINing to itself you get the desired result -

SELECT
    recipe_id 
FROM 
    ingredient i 
    JOIN recipe_ingredient ri ON i.id = ri.ingredient_id 
    LEFT JOIN ingredient i_syn ON i.synonym_group = i_syn.synonym_group AND i.id <> i_syn.id
WHERE i.name = 'Bell Pepper' OR i_syn.name = 'Bell Pepper';

More important than the query is the need to have the inverse index on the junction table -

ALTER TABLE recipe_ingredient
    ADD INDEX IDX_ingredient_recipe (ingredient_id, recipe_id);

Your current query can be improved by nesting the ingredients list in another derived table and modified to allow for searches based on multiple ingredients with a best match type ranking. This is a bit of a hybrid of your original query and the solution proposed by P.Salmon -

SELECT r.name, i.name
FROM (
    SELECT recipe_id, COUNT(DISTINCT ri.ingredient_id) AS num_matched
    FROM recipe_ingredient ri
    JOIN (
        SELECT COALESCE(i2.id, i1.id) AS ingredient_id
        FROM ingredient i1
        LEFT JOIN ingredient i2 ON i1.synonym_group = i2.synonym_group
        WHERE i1.name IN ( 'capsicum', 'onions' )
    ) all_syn ON ri.ingredient_id = all_syn.ingredient_id
    GROUP BY recipe_id
    ORDER BY num_matched DESC
    LIMIT 0, 10
) mr
JOIN recipe r ON mr.recipe_id = r.id
JOIN recipe_ingredient ri ON r.id = ri.recipe_id
JOIN ingredient i ON ri.ingredient_id = i.id
ORDER BY mr.num_matched DESC, r.name ASC;

Here's a db<>fiddle

This query does not rely on the synonym_group being the id of one of its members.

Note the GROUP BY... ORDER BY... and LIMIT clauses on the middle layer query. With my test dataset of 39,774 recipes, 6,702 ingredients and 428,249 the server-side execution time is 13-16ms

Here's a little db<>fiddle to illustrate how the innermost query works to get the ingredients list.