Search code examples
mysqlsqlcachingindices

MySQL performance boost after create & drop index


I have a large MySQL, MyISAM table of around 4 million rows running in a core 2 duo, 8G RAM laptop.

This table has 30 columns including varchar, decimal and int types.

I have an index on a varchar(16). Let's call this column: "indexed_varchar_column".

My query is

SELECT 9 columns FROM the_table WHERE indexed_varchar_column = 'something';

It always returns around 5000 rows for every 'something' I query against.

An EXPLAIN to the query returns this:

+----+-------------+-------------+------+----------------------------------------------------+--------------------------------------------+---------+-------+------+-------------+
| id | select_type | table       | type | possible_keys                                      | key                                        | key_len | ref   | rows | Extra       |
+----+-------------+-------------+------+----------------------------------------------------+--------------------------------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | the_table   | ref  | many indexes including indexed_varchar_column      | another_index NOT: indexed_varchar_column! | 19      | const | 5247 | Using where |
+----+-------------+-------------+------+----------------------------------------------------+--------------------------------------------+---------+-------+------+-------------+

First thing is I'm not sure is why another_index is chosen. In fact it chooses an index which is a composite index of indexed_varchar_column and another 2 columns (which form part of the selected ones). Perhaps this makes sense since it may make things a bit faster for not having to read 2 of the columns in the query. The real QUESTION is the following one:

The query takes 5 seconds for every 'something' I match. On the 2nd time I query against 'something' it takes 0.15 secs (I guess because the query is being cached). When I run another query against 'something_new' it takes again 5 seconds. So, it is consistent.

THE PROBLEM IS: I discovered that creating an index (another composite index including my indexed_varchar_column) and dropping it again produces that all further queries against new 'something_other' take only 0.15 secs. Please note that 1) I create an index 2) drop it again. So everything is in the same state.

I guess all the operations needed for building and dropping indices make the SQL engine to cache something that is then reused. When I run EXPLAIN on a query after all this I get exactly the same as before.

How can I proceed to understand what is cached in the create-drop index procedure so that I can cache it without manipulating indices?

UPDATE:

Following a comment from Marc B that suggested that when mySQL creates an index it internally does a SELECT... I tried the following:

SELECT * FROM my_table;

It took 30 secs and returned 4 million rows. The good thing is that all further queries are very fast again (until I reboot the system). Please note that after rebooting the queries are slow again. I guess this is because mySQL is using some sort of OS caching.

Any idea? How can I explicitly cache the table I guess?

UPDATE 2: Perhaps I should have mentioned that this table may be severely fragmented. It's 4 million rows but I remove lots of old fields regularly. I also add new ones. Since I had large gaps in IDs (for the rows deleted) every day I drop the primary index (ID) and create it again with consecutive numbers. The table may be then very fragmented and therefore IO must be an issue... Not sure what to do.


Solution

  • Thanks everybody for your help.

    Finally I discovered (thanks to the hint of Marc B) that my table was severely fragmented after many INSERTs and DELETEs. I updated the question with this info some hours ago. There are two things that help:

    1)

    ALTER TABLE my_table ORDER BY indexed_varchar_column;
    

    2) Running:

    myisamchk --sort-records=4 my_table.MYI  (where 4 corresponds to my index)
    

    I believe both commands are equivalent. Queries are fast even after a system reboot. I've put this ALTER TABLE ORDER BY command on a cron that is run everyday. It takes 2 minutes but it's worth it.