There is a product table that has a ManyToMany link to Categories.
My goal is to select the top 5 most suitable products for the given categories.
Now I have implemented such a query, but it is not very effective, because at first it finds a match by the number of categories, and by the first match with the category name.
@Query("SELECT p FROM Product p JOIN p.categories c WHERE c.name IN (:categories) ORDER BY SIZE(p.categories) DESC")
List<Product> findByCategories_NameInOrderByCount(@Param("categories") Collection<String> categories);
How can I improve my query so that it will be the first to return products with the largest number of categories matching the name?
Your query is wrong, you should use something similar to this:
@Query("SELECT p FROM Product p WHERE (SELECT COUNT(*) FROM p.categories c WHERE c.name IN (:categories)) > 0 ORDER BY (SELECT COUNT(*) FROM p.categories c WHERE c.name IN (:categories)) DESC")
List<Product> findByCategories_NameInOrderByCount(@Param("categories")
Collection<String> categories);
Anyway, this will probably not perform very well either because the database will have to join and sort the three tables. To avoid this, one normally uses an inverted or multi-valued index as MySQL calls it (https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued). Drop the product_has_category and category tables and instead introduce a JSON column "details" in the product table.
You can then create an index:
CREAT INDEX categories ON product( (CAST(JSON_EXTRACT(categories, '$[*]') AS VARCHAR(80) ARRAY)) )
And query it like this:
@Query("SELECT p FROM Product p WHERE FUNCTION('JSON_OVERLAPS', FUNCTION('JSON_EXTRACT', p.categories, '$[*]'), :categories) = 1 ORDER BY FUNCTION('COUNT_MATCHES', p.categories, :categories) DESC")
List<Product> findByCategories_NameInOrderByCount(@Param("categories")
String categories);
But you need to add a function to your database:
CREATE FUNCTION COUNT_MATCHES (categories JSON, searchCategories JSON)
RETURNS INT DETERMINISTIC
RETURN (
SELECT COUNT(*)
FROM JSON_TABLE(categories, '$[*]' COLUMNS(
name VARCHAR(80) PATH '$' ERROR ON ERROR)
) c
JOIN JSON_TABLE(searchCategories, '$[*]' COLUMNS(
name VARCHAR(80) PATH '$' ERROR ON ERROR )
) s ON c.name = s.name
);