a
and Table b
, described below:-------------------
CREATE TABLE a (
id int(11) NOT NULL AUTO_INCREMENT,
a1 varchar(255) NULL,
a2 varchar(255) NULL,
a3 varchar(255) NULL,
a4 varchar(255) NULL,
b5 varchar(255) NULL,
PRIMARY KEY (id),
INDEX a_idx_b5 (b5) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;
-------------------
CREATE TABLE b (
id int(11) NOT NULL AUTO_INCREMENT,
B1 varchar(255) NULL,
B2 varchar(255) NULL,
B3 varchar(255) NULL,
B4 int(255) NULL,
B5 varchar(255) NULL,
PRIMARY KEY (id),
INDEX b_idx_b5 (B5) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;
-------------------
And I want to get data from table a
and table b
with the following SQL query:
select
a.*,
max_b4
from
a,
(
select
b5,
max(b4) as max_b4
from
b
group by
b5
) c
where
a.b5 = c.b5
When I have a lots of data (more than one million) in table a
and table b
it become very slow, so then I explain
the SQL, and I got this result:
+----+-------------+------------+------------+-------+---------------+----------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+----------+---------+------+---------+----------+-------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 3209319 | 100.00 | Using where |
| 1 | PRIMARY | a | NULL | ref | a_idx_b5 | a_idx_b5 | 1023 | c.b5 | 459 | 100.00 | NULL |
| 2 | DERIVED | b | NULL | index | b_idx_b5 | b_idx_b5 | 1023 | NULL | 3209319 | 100.00 | NULL |
+----+-------------+------------+------------+-------+---------------+----------+---------+------+---------+----------+-------------+
As you can see the <derived2>
table don't have a <auto_key>
which mysql will add to the subquery after 5.6! So this result why the sql become very slow.
But as the mysql doc describes derived-table-optimization:
The optimizer constructs an index over column f1 from derived_t2 if doing so would permit the use of ref access for the lowest cost execution plan.
Usually mysql will add an index in subquery, so my question is why mysql do not give an index in <derived2>
table?
The index b_idx_b5
on the b
table is not being used because it does not speed up the GROUP BY
subquery. You should be using this index:
CREATE INDEX idx ON b (b5, b4);
This should allow MySQL to very efficiently find the max b4
values for each group of b5
records.
As a side note, I probably would have written your query using an explicit join, rather than an implicit one:
SELECT t1.*, t2.max_b4
FROM a t1
INNER JOIN
(
SELECT b5, MAX(b4) AS max_b4
FROM b
GROUP BY b5
) t2
ON t2.b5 = t1.b5;
But, both the above version and your original version would probably have the same execution plan.