Having a table which has a primary key as varchar
and is used for looking up the with WHERE column like 'some_string%'
against a huge amount of data (millions) and huge amount of requests (1000 per seconds) I need to improve the performance of such query
FULLTEXT
?NOTE: It has been pointed out in the comment section: "Why don't you try it out and see whether you get any performance gain?"
Which I will eventually try it out as well offcourse, but the objective of the quesiton (and asking In StackOverflow) is not just 'would this run faster'? But also have some other opinion with experience in the subject, that may tell me whether is usefull, wont do any changes, dangerous, a better way etc..?
To run actual test and benchmark there are many factors and is very time consuming, hopefully someone that knows by heart the answer will tell. In order to truly test something over performance, especially on this nature, you need to test it in different environment, with a lot of test data and test different scenario. Is not enough to test it locally, you may get false result for whatever reason or this may cause issue in the future that I am not aware of
Giving this table
create table book
(
title varchar(100) not null,
author varchar(20) not null,
created datetime default CURRENT_TIMESTAMP not null,
primary key (title, author)
);
And we insert some test data
insert into book (author, title) values
('JohnDoe', 'bookSummer_1'),
('JohnDoe', 'bookSummer_2'),
('JohnDoe', 'bookSummer_3'),
('JohnDoe', 'bookSummer_4'),
('JohnDoe', 'bookWinder_1'),
('JohnDoe', 'bookWinder_2'),
('JohnDoe', 'bookWinder_3'),
('JohnDoe', 'bookWinder_4')
Then, we grab the title books by its prefix as so
SELECT author, title from book WHERE author='JohnDoe' and title like 'bookSummer_%';
Now, imagine that 1 author has millions of millions of different books, we the title prefix may be as longs as 100 chars. As you can see, the title
column is already a PRIMARY KEY
Would add the column title
an additional index of type FULLTEXT
be beneficial (and probably change the query?
So I mean add as so
ALTER TABLE book ADD FULLTEXT INDEX `title_fulltext_index` (`title`);
With your specific query, e.g. a prefix-search (like 'somestring%'
), a fulltext index makes no sense.
A fulltext index can improve performance if you look for words in the middle of a sentence, e.g. 'some text somestring'
. A normal index will not index somestring
, so to find it, you will have to look through the whole table. A fulltext index will index every word, so you can find 'some'
, 'text'
and 'somestring'
with a (fast) index lookup.
And there are some additional steps involved: a fulltext index cannot be used together with another index, so if you find the title and want to verify that the author matches, you have to read the actual row from the table.
If you instead look up the title with the primary key, you already read the row from the table (as reading the primary key means reading the table, at least for InnoDB), so you are already there.
It's hard to even come up with theoretical scenarios where a fulltext index would be faster for your situation. One situation might be if your table is really large, preferably too large to fit into memory, while the fulltext index is small and can be kept in cache, and you look for an entry that is not there. Then the fulltext index can give you the answer without slow disk reads. But even then, there are probably better ways to improve your setup.
By the way, the optimal index for your query based on your SELECT
query -> where author = 'string' and title like 'otherstring%'
is this index:
PRIMARY KEY (author, title)
And not
PRIMARY KEY (title, author)
Because has the columns in a different order than in your primary key.
This is because the index lookup cannot use the second column anymore when you use the first column for a partial match, but it can use both columns if the comparison on the first column is "=".
Giving a Phone book an a SELECT
query like:
SELECT first_name, last_name FROM phone_book
WHERE first_name = 'Federico' AND last_name LIKE 'B%'
PRIMARY KEY
index is: (last_name, first_name)
If I asked you to find everyone with the first_name
"Federico" and a last_name
like "B%", the first_name
doesn't reduce the number of entries you have to look at you will read all first_name
for entries that start with "B".
PRIMARY KEY
index is: (first_name, last_name)
You can skip all first names that are not "Federico", and all last names that don't start with "B".
If it makes a difference with your data, and especially if it makes enough difference to be worth the additional resourcesthat the index requires (e.g. disk space, time to update), is something you would have to test though.