I have written a search query which searches for similar names. It works with Power Set of tags and it sorts by similarity. for example if the search text is: shakespeare tragedy hamlet
the SQL generated is:
SELECT DISTINCT id FROM (
(SELECT * FROM books
WHERE name LIKE '%shakespeare%'
AND name LIKE '%tragedy%'
AND name LIKE '%hamlet%' limit 10)
UNION
(SELECT * FROM books
WHERE name LIKE '%shakespeare%'
AND name LIKE '%tragedy%' limit 10)
UNION
(SELECT * FROM books
WHERE name LIKE '%shakespeare%'
AND name LIKE '%hamlet%' limit 10)
UNION
(SELECT * FROM books
WHERE name LIKE '%tragedy%'
AND name LIKE '%hamlet%' limit 10)
UNION
(SELECT * FROM books WHERE name LIKE '%shakespeare%' limit 10)
UNION
(SELECT * FROM books WHERE name LIKE '%tragedy%' limit 10)
UNION
(SELECT * FROM books WHERE name LIKE '%hamlet%' limit 10)
) limit 10
there are two problems:
The Power Set creates 2^tags - 1
unions in my query, which means if some one wants to be precise and uses 6 tags, it will be 63 unions and it makes my query much slower.
if the first union returns 10 rows, others are useless.
Is there a way to optimize this query?
We can get all boosk where name is similar to past tag and add a custom ORDER BY based on similarity. If name contains tag +1 if not 0. SO if name contains all the 3 tags sum is 3 if just one sum is 1.
SELECT DISTINCT id
FROM books
where name LIKE '%shakespeare%'
OR name LIKE '%tragedy%'
OR name LIKE '%hamlet%'
ORDER BY IF(INSTR(name, 'shakespeare')>0,1,0)+
IF(INSTR(name, 'tragedy')>0,1,0)+
IF(INSTR(name, 'hamlet')>0,1,0) DESC
LIMIT 10
UPDATE: ORDER BY could be based on sum or just commas