Search code examples
mysqlperformanceselectindexingfull-text-search

MySQL | index FULLTEXT vs index PRIMARY KEY used in a WHERE like Query


Problem

Having a table which has a primary key as varchar and is used for looking up the with WHERE column like 'some_string%' against a huge amount of data (millions) and huge amount of requests (1000 per seconds) I need to improve the performance of such query

Questions

  • Would that primary key column work better if is also a index of type FULLTEXT?
  • Is there a faster way to look up records giving the below scenario, or that is good as it gets?

NOTE: It has been pointed out in the comment section: "Why don't you try it out and see whether you get any performance gain?"

Which I will eventually try it out as well offcourse, but the objective of the quesiton (and asking In StackOverflow) is not just 'would this run faster'? But also have some other opinion with experience in the subject, that may tell me whether is usefull, wont do any changes, dangerous, a better way etc..?

To run actual test and benchmark there are many factors and is very time consuming, hopefully someone that knows by heart the answer will tell. In order to truly test something over performance, especially on this nature, you need to test it in different environment, with a lot of test data and test different scenario. Is not enough to test it locally, you may get false result for whatever reason or this may cause issue in the future that I am not aware of

Scenario

Giving this table

create table book
(
    title       varchar(100)                       not null,
    author      varchar(20)                        not null,   
    created     datetime default CURRENT_TIMESTAMP not null,
    primary key (title, author)
);

And we insert some test data

insert into book (author, title) values
    ('JohnDoe', 'bookSummer_1'),
    ('JohnDoe', 'bookSummer_2'),
    ('JohnDoe', 'bookSummer_3'),
    ('JohnDoe', 'bookSummer_4'),
    ('JohnDoe', 'bookWinder_1'),
    ('JohnDoe', 'bookWinder_2'),
    ('JohnDoe', 'bookWinder_3'),
    ('JohnDoe', 'bookWinder_4')

Then, we grab the title books by its prefix as so

SELECT author, title from book WHERE author='JohnDoe' and title like 'bookSummer_%';

Now, imagine that 1 author has millions of millions of different books, we the title prefix may be as longs as 100 chars. As you can see, the title column is already a PRIMARY KEY Would add the column title an additional index of type FULLTEXT be beneficial (and probably change the query?

So I mean add as so

ALTER TABLE book ADD FULLTEXT INDEX `title_fulltext_index` (`title`);

Solution

  • With your specific query, e.g. a prefix-search (like 'somestring%'), a fulltext index makes no sense.

    A fulltext index can improve performance if you look for words in the middle of a sentence, e.g. 'some text somestring'. A normal index will not index somestring, so to find it, you will have to look through the whole table. A fulltext index will index every word, so you can find 'some', 'text' and 'somestring' with a (fast) index lookup.

    And there are some additional steps involved: a fulltext index cannot be used together with another index, so if you find the title and want to verify that the author matches, you have to read the actual row from the table.

    If you instead look up the title with the primary key, you already read the row from the table (as reading the primary key means reading the table, at least for InnoDB), so you are already there.

    It's hard to even come up with theoretical scenarios where a fulltext index would be faster for your situation. One situation might be if your table is really large, preferably too large to fit into memory, while the fulltext index is small and can be kept in cache, and you look for an entry that is not there. Then the fulltext index can give you the answer without slow disk reads. But even then, there are probably better ways to improve your setup.

    Optimal index for your query (Multiple-Part Indexes)

    By the way, the optimal index for your query based on your SELECT query -> where author = 'string' and title like 'otherstring%' is this index:

    PRIMARY KEY (author, title)
    

    And not

    PRIMARY KEY (title, author)
    

    Because has the columns in a different order than in your primary key.

    This is because the index lookup cannot use the second column anymore when you use the first column for a partial match, but it can use both columns if the comparison on the first column is "=".

    Imagine e.g. :

    Giving a Phone book an a SELECT query like:

    SELECT first_name, last_name FROM phone_book 
    WHERE first_name = 'Federico' AND last_name LIKE 'B%'
    
    • If PRIMARY KEY index is: (last_name, first_name)

    If I asked you to find everyone with the first_name "Federico" and a last_name like "B%", the first_name doesn't reduce the number of entries you have to look at you will read all first_name for entries that start with "B".

    • If PRIMARY KEY index is: (first_name, last_name)

    You can skip all first names that are not "Federico", and all last names that don't start with "B".

    If it makes a difference with your data, and especially if it makes enough difference to be worth the additional resourcesthat the index requires (e.g. disk space, time to update), is something you would have to test though.