I have two MySQL tables:
CREATE TABLE `shows` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`data` text NOT NULL,
`created_at` date NOT NULL DEFAULT current_timestamp(),
PRIMARY KEY (`id`),
KEY `created_at` (`created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `clicks` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`show_id` int(11) unsigned NOT NULL,
`data` text NOT NULL,
`created_at` date NOT NULL DEFAULT current_timestamp(),
PRIMARY KEY (`id`),
KEY `created_at` (`created_at`),
KEY `show_constaraint_idx` (`show_id`),
CONSTRAINT `show_constaraint` FOREIGN KEY (`show_id`) REFERENCES `shows` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Each table has 40 000 000 records.
I added 40 000 000 records to shows table:
drop procedure if exists doWhile;
DELIMITER //
CREATE PROCEDURE doWhile()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE (i <= 40000000) DO
INSERT INTO `shows` (data, created_at) values (
CONCAT(MD5(RAND()), MD5(RAND())),
FROM_UNIXTIME(unix_timestamp('2021-01-01') + floor(rand() * (unix_timestamp('2021-12-31') - unix_timestamp('2021-01-01') + 1)))
);
SET i = i + 1;
END WHILE;
END;
//
CALL doWhile();
I added 40 000 000 records to clicks table:
drop procedure if exists doWhile;
DELIMITER //
CREATE PROCEDURE doWhile()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE (i <= 40000000) DO
INSERT INTO `clicks` (show_id, data, created_at) values (
(FLOOR(1 + RAND() * 40000000)),
CONCAT(MD5(RAND()), MD5(RAND())),
FROM_UNIXTIME(unix_timestamp('2021-01-01') + floor(rand() * (unix_timestamp('2021-12-31') - unix_timestamp('2021-01-01') + 1)))
);
SET i = i + 1;
END WHILE;
END;
//
CALL doWhile();
SQL query execution time is 10 sec:
SELECT shows.id,
shows.data,
clicks.id,
clicks.data
FROM clicks
INNER JOIN shows ON shows.id = clicks.show_id
WHERE shows.created_at = '2021-03-03'
AND clicks.created_at >= '2021-03-03'
AND clicks.created_at <= '2021-03-06'
I tried to optimize SQL query, but SQL query execution time is 10 sec:
SELECT sub_shows.id,
sub_shows.data,
sub_clicks.id,
sub_clicks.data
FROM (
SELECT clicks.id,
clicks.show_id,
clicks.data
FROM clicks
WHERE clicks.created_at >= '2021-03-03' AND clicks.created_at <= '2021-03-06'
) as sub_clicks
INNER JOIN (
SELECT shows.id,
shows.data
FROM shows
WHERE shows.created_at = '2021-03-03'
) as sub_shows ON sub_shows.id = sub_clicks.show_id
EXPLAIN same for both queries:
How can I optimize the query?
Answer: Change innodb_buffer_pool_size to 12G (Rick James)
You would be better to have a compound index covering BOTH parts of the join criteria vs individual.
Suggest Indexes as listed
Shows ( created_at, id )
Clicks ( show_id, created_at )
The reason for both keys to help is that an index utilize both parts without having to go to the raw data pages to qualify the records. Only after qualifying does it need to retrieve the additional columns in result set.
I would also adjust and reverse the query to the smallest granular item... The SHOWS table on the single date in the primary FROM position of the query.
SELECT
shows.id,
shows.data,
c.id,
c.data
FROM
shows s
JOIN clicks c
ON s.id = c.show_id
AND c.created_at >= '2021-03-03'
AND c.created_at <= '2021-03-06'
WHERE
s.created_at = '2021-03-03'