The table in question (10.1.29-MariaDB):
[Table] => blog_search
[Create Table] => CREATE TABLE `blog_search` (
`bkey` varchar(28) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`blog_id` int(10) unsigned NOT NULL,
`follows` int(10) unsigned NOT NULL,
UNIQUE KEY `bkey` (`bkey`,`blog_id`),
UNIQUE KEY `blog_id` (`blog_id`,`bkey`) USING BTREE,
KEY `bkey_follows` (`bkey`,`follows`,`blog_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
bkey
is a word used in the title of a blog. The follows
data would be repeated per word per blog.
I had a Select that worked great without wildcards (key=>bkey_follow
chosen by DB):
SELECT blog_id
FROM blog_search
WHERE bkey = 'news' OR bkey = 'usa'
GROUP BY blog_id
HAVING COUNT(*) > 1
ORDER BY FOLLOWS DESC
LIMIT 30
But when I used wildcards (key=>bkey
chosen by DB):
WHERE bkey LIKE 'news%' OR bkey LIKE 'usa%'
It still is able to do a index search, but 'news' and 'newsletter' key words would match as COUNT(*) = 2, but it would not have found 'usa'
I made this new query, that works like I wanted, but if the last word have lots of results, the query becomes much slower (since queries are executed from back to front, and has to read 100% of the index) (key=>blog_id
chosen by DB):
SELECT blog_id
FROM blog_search
WHERE
blog_id IN (SELECT blog_id FROM blog_search WHERE bkey LIKE 'news%')
AND blog_id IN (SELECT blog_id FROM blog_search WHERE bkey LIKE 'usa%')
GROUP BY blog_id
ORDER BY follows DESC
LIMIT 30
If I FORCE INDEX (bkey)
or even better FORCE INDEX (bkey_follows)
it improves a little bit, like 35% but still is x10 times slower that I think should be possible.
I would like a query that's more optimized, without sub-queries. Something like this WRONG query, (but a correct version):
SELECT blog_id, 0 AS dist1, 0 AS dist2
FROM blog_search
WHERE ( bkey LIKE 'news%' AND 1=dist1=1 )
OR ( bkey LIKE 'usa%' AND 1=dist2=1 )
GROUP BY blog_id
HAVING (dist1=1 AND dist2=1)
ORDER BY follows DESC
LIMIT 30
Additional data: I'm developing searches for titles of blogs/posts. I tried many things: Regex, LIKE %...%, and FULLTEXT index which I "approved" to use for relevancy ordered results. The more promising search is the one discussed in this question, because is the best for it's speed and at ordering results by 'follows'.
You don't care how many rows match. Many other solutions lead to the "explode-implode" syndrome. That is where you gather lots of rows, only to later boil down to just a few via GROUP BY
. This avoids half of that by using a "semi-join", EXISTS
.
SELECT DISTINCT blog_id
FROM blog_search AS o
WHERE WHERE bkey LIKE 'news%'
AND EXISTS ( SELECT blog_id FROM blog_search
WHERE bkey LIKE 'usa%'
AND blog_id = o.blog_id )
ORDER BY follows DESC
LIMIT 30