I have 2 tables (pages and comments), around 130 000 rows each.
I want to list pages without any comments (foreign key is comments.page_id)
If I execute the normal left outer join, it takes an amazing more than 750 seconds to run. (130k^2 = 17B). Whereas if I execute the same join, but using subqueries for the tables, it takes just 1 second.
Server version: 5.6.44-log - MySQL Community Server (GPL):
SELECT p.id
FROM `pages` AS p
LEFT JOIN `comments` AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
SELECT p.id
FROM (
SELECT id FROM `pages`
) AS p
LEFT JOIN `comments` AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
SELECT p.id
FROM `pages` AS p
LEFT JOIN (
SELECT * FROM `comments`
) AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
SELECT p.id
FROM (
SELECT id FROM `pages`
) AS p
LEFT JOIN (
SELECT * FROM `comments`
) AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
SELECT p.id
FROM (
SELECT id FROM `pages`
) AS p
LEFT JOIN (
SELECT page_id FROM `comments`
) AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
SELECT p.id
FROM `pages` AS p
WHERE NOT EXISTS( SELECT page_id FROM `comments`
WHERE page_id = p.id );;
Now, in MySql version 5.7, all of the above queries take "too much time" to execute.
In MySql 5.7, query 1 and 4 have same explanation:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 SIMPLE p NULL index PRIMARY PRIMARY 4 NULL 147626 100.00 Using index; Using temporary; Using filesort
1 SIMPLE c NULL ALL NULL NULL NULL NULL 147790 10.00 Using where; Not exists; Using join buffer (Block Nested Loop)
In MySql 5.6, unfortunately I cannot get the explanation for query 1 right now (taking too much time), but for query 4 is the below:
id select_type table type possible_keys key key_len ref rows Extra
---------------------------------------------------------------------------------------------------------------------------
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 147626 Using temporary; Using filesort
1 PRIMARY <derived3> ref <auto_key0> <auto_key0> 4 p.id 10 Using where; Not exists
3 DERIVED comments ALL NULL NULL NULL NULL 147790 NULL
2 DERIVED pages index NULL PRIMARY 4 NULL 147626 Using index
Tables:
CREATE TABLE `pages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`identifier` varchar(250) NOT NULL DEFAULT '',
`reference` varchar(250) NOT NULL DEFAULT '',
`url` varchar(1000) NOT NULL DEFAULT '',
`moderate` varchar(250) NOT NULL DEFAULT 'default',
`is_form_enabled` tinyint(1) unsigned NOT NULL DEFAULT '1',
`date_modified` datetime NOT NULL,
`date_added` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=147627 DEFAULT CHARSET=utf8
CREATE TABLE `comments` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL DEFAULT '0',
`page_id` int(10) unsigned NOT NULL DEFAULT '0',
`website` varchar(250) NOT NULL DEFAULT '',
`town` varchar(250) NOT NULL DEFAULT '',
`state_id` int(10) NOT NULL DEFAULT '0',
`country_id` int(10) NOT NULL DEFAULT '0',
`rating` tinyint(1) unsigned NOT NULL DEFAULT '0',
`reply_to` int(10) unsigned NOT NULL DEFAULT '0',
`comment` text NOT NULL,
`reply` text NOT NULL,
`ip_address` varchar(250) NOT NULL DEFAULT '',
`is_approved` tinyint(1) unsigned NOT NULL DEFAULT '1',
`notes` text NOT NULL,
`is_admin` tinyint(1) unsigned NOT NULL DEFAULT '0',
`is_sent` tinyint(1) unsigned NOT NULL DEFAULT '0',
`sent_to` int(10) unsigned NOT NULL DEFAULT '0',
`likes` int(10) unsigned NOT NULL DEFAULT '0',
`dislikes` int(10) unsigned NOT NULL DEFAULT '0',
`reports` int(10) unsigned NOT NULL DEFAULT '0',
`is_sticky` tinyint(1) unsigned NOT NULL DEFAULT '0',
`is_locked` tinyint(1) unsigned NOT NULL DEFAULT '0',
`is_verified` tinyint(1) unsigned NOT NULL DEFAULT '0',
`date_modified` datetime NOT NULL,
`date_added` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=147879 DEFAULT CHARSET=utf8
Why is this happening? What does MySql do under the hood?
Does this happen only in MySql, or any other Sql as well?
How can I write a fast query to get what I need? (In both v 5.6, 5.7)
The problem with your long-running queries, is that you lack an index on the page_id column of the comments table. Hence, for each row from the pages table, you need to check all rows of the comments table. Since you are using LEFT JOIN, this is the only possible join order. What happens in 5.6, is that when you use a subquery in the FROM clause (aka derived table), MySQL will create an index on the temporary table used for the result of the derived table (auto_key0 in the EXPLAIN output). The reason it is faster when you only select one column, is that the temporary table will be smaller.
In MySQL 5.7, such derived tables will be automatically merge into the main query, if possible. This is done to avoid the extra temporary tables. However, this means that you no longer have an index to use for the join. (See this blog post for details.)
You have two options to improve the query time in 5.7:
In MySQL 8.0, you can also use an optimizer hint to avoid the merging. In your case, that would be something like
SELECT /*+ NO_MERGE(c) */ ... FROM
See slides 34-37 of this presentation for examples of how to use such hints.