I'm using MySQL 5.6 and am having some trouble with performance of queries in some very large tables. Specifically I'm unsure how to index my tables properly when there are both references between the tables as well as an in() condition.
Simplified tables are as follows (A and B are both huge tables whereas C is a small table with about 20 rows, all tables are InnoDB)
A(id int, created datetime, val int)
B(id int, a_id int, c_id int)
C(id int, val int)
The query in question looks something like this:
Select a.id
from a
join b ON (a.id = b.a_id)
where a.created >= now() - interval 90 day
and a.val = 0
and b.c_id in (
SELECT id
from c
where val = 1 )
I have created an index on A as (val, created, id) and one on B as (c_id, a_id) which works great when there is an '=' condition on c_id (c_id = 5 for example) however with the 'in()' condition the explain is telling me that my index on A is not being used but instead it is using the primary key index and this query is taking forever. Forcing the use of my index does not seem to help either.
Any tips or ideas on how to index this better or otherwise increase performance of this type of query?
IN ( SELECT ... )
is less efficient than JOIN
.
Select a.id
from a
join b ON (a.id = b.a_id)
JOIN c ON b.c_id = c.id
where a.created >= now() - interval 90 day
and a.val = 0
and c.val = 1
Indexes:
A: INDEX(val, created) -- in that order
B: INDEX(a_id, c_id) -- in that order; "covering"
C: Nothing new needed, assuming you have PRIMARY KEY(id) and InnoDB
(Edit) The indexes assume the tables will be done in this order: A, B, C. It is very likely that it will because... A
probably has the best selectivity in the WHERE
. Obviously, B
, then C
come next. Hence, my ordering of the index for B
.
Assuming A's PK is (id)
, then INDEX(val, created)
is identical to INDEX(val, created, id)
(as you proposed).
With a 'derived' table formulation, the Optimizer 'must' start with C
, then move on to B
and, finally, A
:
C: INDEX(val, id) -- (again, `id` optional)
B: INDEX(c_id, a_id) -- as you stated
A: Given that it has `PRIMARY KEY(id)` and is InnoDB, no index is useful.
Because of the inability of filtering on a.val and a.created, I predict that even this formulation will be slower than mine:
Select a.id
FROM ( SELECT id FROM C WHERE val = 1 ) AS cx
JOIN B ON b.c_id = cx.id
JOIN A ON (a.id = b.a_id)
where a.created >= now() - interval 90 day
and a.val = 0
Index Cookbook. If B is a many:many mapping table, then take special note of the section on that topic.