Search code examples
regexsearchtagssql-likedivide

Fastest way to search a varchar column in MySQL


I want to implement a search query for a bookshop. I use MySQL and I have a varchar column which contains name, author or other details like The Tragedy of Hamlet, Prince of Denmark, by William Shakespeare and I want to search like shakespeare tragedy or denmark tragedy to have a list of books have them in their one column.

I have three queries to implement this but I want to know about their performance.

LIKE %%

my first way is to split search text into words and create a dynamic command based on word counts:

SELECT * FROM books
WHERE name LIKE '%shakespeare%'
AND name LIKE '%tragedy%'

But I was told that like is a slow operator specially with two % because it can not use index.


TAG table and relational division

My second way is to have another table which contains tags like:

-------------------------
| book_id |    tag      |
|-----------------------|
|    1    | Tragedy     |
|    1    | Hamlet      |
|    1    | Prince      |
|    1    | Denmark     |
|    1    | William     |
|    1    | Shakespeare |
-------------------------

And create a dynamic divide command:

SELECT DISTINCT book_id FROM booktag AS b1 
WHERE ((SELECT 'shakespeare' as tag UNION SELECT 'tragedy' as tag)
       EXCEPT
       SELECT tag FROM booktag AS b2 WHERE b1.book_id = b2.book_id) IS NULL

But I was told that relational division is so slow too.


REGEXP

My third way is to use regular expressions:

SELECT * FROM books
WHERE name REGEXP '(?=.*shakespeare)(?=.*tragedy)'

But someone told me that it is slower than LIKE

Please help me decide which way is faster?


Solution

  • Surely using LIKE which is a built-in operand, is more optimized than Regular expression. But there is an important point here that you can not compare these two recipes together, because LIKE used to add a wildcard to string and regex is for matching a string based on a pattern which can be much complex.

    Anyway the best ways which come in my mind for this aim, would be one of the followings:

    1. Use LIKE on your column which has been indexed properly.1
    2. Using some optimized search technologies like elastic search.
    3. Implement a multithreading algorithm 2 which performs very good with IO tasks. For this one you can use some tricks like defining an offset and divide the table among the threads.

    Also for some alternative ways read this article https://technet.microsoft.com/en-us/library/aa175787%28v=sql.80%29.aspx


    1. You should be careful of the way you put the indices on your columns.read this answer for more info https://stackoverflow.com/a/10354292/2867928 and this post http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning

    2.Read this answer for more info Multi Thread in SQL?