Search code examples
mysqlselectmariadbinnodbexplain

How to index a MySQL InnoDB table to query with select ... where key in ( some values here)?


I have a mariadb 10.3 server, and the following table (using InnoDB storage engine):

create table if not exists token (
   `token` bigint unsigned not null,
    `uid` smallint unsigned not null default 0,
    `nham` int default 0,
    `nspam` int default 0,
    `timestamp` int unsigned default 0
) Engine=InnoDB;

create index token_idx1 on token(token);
create index token_idx2 on token(uid);

The token table has ~900k rows, and I'd like to execute the following query with a 2-300 numbers in the IN ( ) clause:

select token, nham, nspam from token where token in (1,2,3,4,...);

Now the problem: the query executes very slowly, and it just won't use token_idx1:

+------+-------------+-------+------+---------------+------------+---------+-------+--------+-------------+
|  id  | select_type | table | type | possible_keys |     key    | key_len |  ref  |  rows  |    Extra    |
+------+-------------+-------+------+---------------+------------+---------+-------+--------+-------------+
|   1  |    SIMPLE   | token |  ref |   token_idx1  | token_idx1 |    2    | const | 837534 | Using where |
+------+-------------+-------+------+---------------+------------+---------+-------+--------+-------------+

Since the token column is indexed, I'm surprised that explain select says that the optimizer has no interest on token_idx1 (and the query takes a long time, ~30 sec due to the full table scan).

How to fix the issue? I'm aware that I could use USE INDEX(token_idx1) in the query, but I'd solve it without such hack.


Solution

  • The solution is to rewrite the query. So while a query like this sucks at performance:

    select token, nham, nspam from token where token in (1,2,3,4,...);

    The following query is fast as it should be (even if some token values don't exist in the table):

    select token, nham, nspam from token where token=1 or token=2 or token=3 or ...;

    So the issue is solved, though I still don't understand why the optimizer has a hard time for the first query.

    Anyway, thank you for all of your thoughts, ideas and contributions leading me to the workaround.