Search code examples
mysqlsqlgroup-bymariadbsql-like

Select id's containing all indicated Values (Wildcard collisions problem) MariaDB


The table in question (10.1.29-MariaDB):

  [Table] => blog_search
  [Create Table] => CREATE TABLE `blog_search` (
  `bkey` varchar(28) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `blog_id` int(10) unsigned NOT NULL,
  `follows` int(10) unsigned NOT NULL,
  UNIQUE KEY `bkey` (`bkey`,`blog_id`),
  UNIQUE KEY `blog_id` (`blog_id`,`bkey`) USING BTREE,
  KEY `bkey_follows` (`bkey`,`follows`,`blog_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

bkey is a word used in the title of a blog. The follows data would be repeated per word per blog.

I had a Select that worked great without wildcards (key=>bkey_follow chosen by DB):

SELECT blog_id
FROM blog_search
WHERE bkey = 'news' OR bkey = 'usa'
GROUP BY blog_id
HAVING COUNT(*) > 1
ORDER BY FOLLOWS DESC
LIMIT 30

But when I used wildcards (key=>bkey chosen by DB):

WHERE bkey LIKE 'news%' OR bkey LIKE 'usa%'

It still is able to do a index search, but 'news' and 'newsletter' key words would match as COUNT(*) = 2, but it would not have found 'usa'

I made this new query, that works like I wanted, but if the last word have lots of results, the query becomes much slower (since queries are executed from back to front, and has to read 100% of the index) (key=>blog_id chosen by DB):

SELECT blog_id
FROM blog_search
WHERE  
  blog_id IN (SELECT blog_id FROM blog_search WHERE bkey LIKE 'news%') 
  AND blog_id IN (SELECT blog_id FROM blog_search WHERE bkey LIKE 'usa%')
GROUP BY blog_id
ORDER BY follows DESC
LIMIT 30

If I FORCE INDEX (bkey) or even better FORCE INDEX (bkey_follows) it improves a little bit, like 35% but still is x10 times slower that I think should be possible.

I would like a query that's more optimized, without sub-queries. Something like this WRONG query, (but a correct version):

SELECT blog_id, 0 AS dist1, 0 AS dist2
FROM blog_search
WHERE ( bkey LIKE 'news%' AND 1=dist1=1 )
   OR ( bkey LIKE 'usa%' AND 1=dist2=1 )
GROUP BY blog_id
HAVING (dist1=1 AND dist2=1)
ORDER BY follows DESC
LIMIT 30

Additional data: I'm developing searches for titles of blogs/posts. I tried many things: Regex, LIKE %...%, and FULLTEXT index which I "approved" to use for relevancy ordered results. The more promising search is the one discussed in this question, because is the best for it's speed and at ordering results by 'follows'.


Solution

  • You don't care how many rows match. Many other solutions lead to the "explode-implode" syndrome. That is where you gather lots of rows, only to later boil down to just a few via GROUP BY. This avoids half of that by using a "semi-join", EXISTS.

    SELECT DISTINCT blog_id
        FROM blog_search AS o
        WHERE WHERE bkey LIKE 'news%'
          AND EXISTS ( SELECT blog_id FROM blog_search
                          WHERE bkey LIKE 'usa%'
                            AND blog_id = o.blog_id )
    ORDER BY follows DESC
    LIMIT 30