Search code examples
mysqljoinwhere-clause

How to optimize MySQL query with JOIN and WHERE?


I have two MySQL tables:

  • shows (banner shows)
  • clicks (banner clicks).
CREATE TABLE `shows` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`data` text NOT NULL,
`created_at` date NOT NULL DEFAULT current_timestamp(),
PRIMARY KEY (`id`),
KEY `created_at` (`created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `clicks` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`show_id` int(11) unsigned NOT NULL,
`data` text NOT NULL,
`created_at` date NOT NULL DEFAULT current_timestamp(),
PRIMARY KEY (`id`),
KEY `created_at` (`created_at`),
KEY `show_constaraint_idx` (`show_id`),
CONSTRAINT `show_constaraint` FOREIGN KEY (`show_id`) REFERENCES `shows` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Each table has 40 000 000 records.

I added 40 000 000 records to shows table:

drop procedure if exists doWhile;
DELIMITER //
CREATE PROCEDURE doWhile()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE (i <= 40000000) DO
INSERT INTO `shows` (data, created_at) values (
    CONCAT(MD5(RAND()), MD5(RAND())),
    FROM_UNIXTIME(unix_timestamp('2021-01-01') + floor(rand() * (unix_timestamp('2021-12-31') - unix_timestamp('2021-01-01') + 1)))
);
SET i = i + 1;
END WHILE;
END;
//
CALL doWhile();

I added 40 000 000 records to clicks table:

drop procedure if exists doWhile;
DELIMITER //
CREATE PROCEDURE doWhile()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE (i <= 40000000) DO
INSERT INTO `clicks` (show_id, data, created_at) values (
    (FLOOR(1 + RAND() * 40000000)),
    CONCAT(MD5(RAND()), MD5(RAND())),
    FROM_UNIXTIME(unix_timestamp('2021-01-01') + floor(rand() * (unix_timestamp('2021-12-31') - unix_timestamp('2021-01-01') + 1)))
);
SET i = i + 1;
END WHILE;
END;
//
CALL doWhile();

SQL query execution time is 10 sec:

SELECT shows.id,
    shows.data,
    clicks.id,
    clicks.data
FROM clicks
INNER JOIN shows ON shows.id = clicks.show_id
WHERE shows.created_at = '2021-03-03'
AND clicks.created_at >= '2021-03-03'
AND clicks.created_at <= '2021-03-06'

I tried to optimize SQL query, but SQL query execution time is 10 sec:

SELECT sub_shows.id,
    sub_shows.data,
    sub_clicks.id,
    sub_clicks.data
FROM (
    SELECT clicks.id,
        clicks.show_id,
        clicks.data
    FROM clicks
    WHERE clicks.created_at >= '2021-03-03' AND clicks.created_at <= '2021-03-06'
) as sub_clicks
INNER JOIN (
    SELECT shows.id,
        shows.data
    FROM shows
    WHERE shows.created_at = '2021-03-03'
) as sub_shows ON sub_shows.id = sub_clicks.show_id

EXPLAIN same for both queries: enter image description here

How can I optimize the query?

Answer: Change innodb_buffer_pool_size to 12G (Rick James)


Solution

  • You would be better to have a compound index covering BOTH parts of the join criteria vs individual.

    Suggest Indexes as listed
    Shows    ( created_at, id )
    Clicks   ( show_id, created_at )
    

    The reason for both keys to help is that an index utilize both parts without having to go to the raw data pages to qualify the records. Only after qualifying does it need to retrieve the additional columns in result set.

    I would also adjust and reverse the query to the smallest granular item... The SHOWS table on the single date in the primary FROM position of the query.

    SELECT 
          shows.id,
          shows.data,
          c.id,
          c.data
       FROM
          shows s
             JOIN clicks c
                ON s.id = c.show_id
              AND c.created_at >= '2021-03-03'
              AND c.created_at <= '2021-03-06'   
       WHERE 
          s.created_at = '2021-03-03'