Here is the query I'm trying to launch:
SELECT c.creative_id, c.creative_title, c.creative_image_name, c.gravity, c.ad_strength
FROM creatives AS c
INNER JOIN term_relationships AS tr ON c.creative_id = tr.creative_id
WHERE tr.term_id
IN ( 14, 1, 50, 76, 104 )
GROUP BY c.creative_id
HAVING COUNT(tr.term_id ) =5
ORDER BY c.gravity ASC
LIMIT 30;
Here is what EXPLAIN
for this query outputs:
Here is the creatives
table structure:
CREATE TABLE `creatives` (
`creative_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`scraper_id` bigint(20) unsigned DEFAULT NULL,
`creative_title` varchar(255) NOT NULL,
`creative_image_name` varchar(255) DEFAULT NULL,
`image_attrib` varchar(12) DEFAULT NULL,
`original_image_name` varchar(255) DEFAULT NULL,
`creative_subtext` varchar(255) DEFAULT NULL,
`dest_url` varchar(2083) NOT NULL,
`lp_url` varchar(2083) NOT NULL,
`lp_image_name` varchar(255) DEFAULT NULL,
`lp_image_flag` tinyint(1) unsigned NOT NULL DEFAULT '0',
`creative_first_seen` date NOT NULL,
`creative_last_seen` date NOT NULL,
`daily_ad_count` int(5) unsigned NOT NULL,
`ad_strength` int(11) unsigned NOT NULL,
`prev_ad_strength` int(11) unsigned DEFAULT NULL,
`gravity` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`creative_id`),
KEY `gravity` (`gravity`)
) ENGINE=InnoDB AUTO_INCREMENT=173037591 DEFAULT CHARSET=utf8
I'm concerned about Using temporary; using filesort
when launching both with GROUP BY
and ORDER BY
on another column. If I remove ORDER BY
, the temporary and filesort go away and the query runs really fast.
What I don't understand, why mysql needs temporary table, why can't it first where filter + sort by c.gravity
, then group by the resulting table and filter according to HAVING
clause. The filtered table will be sorted by c.gravity
correctly as the gravity value remains unchanged after the grouping and having filter.
What I tried:
Selected everything without ORDER BY
, wrapped into a subquery and joined again on creatives
table - same result, using temporary, filesort and slow
tried to add FORCE USE INDEX FOR ORDER BY (gravity)
and it doesn't change anything. EXPLAIN
and execution time remain the same.
UPDATE: the question has been answered by @Rick and it's really much faster with his correlated subquery and not using GROUP BY
. I'm adding here an EXPLAIN
output for the query:
And the output of SHOW CREATE TABLE term_relationships
with the newly created index:
And one more question to @Rick: why do we need the outer query with c3
? It seems just to join creatives
on its own one more just to get the values from other columns and order the records by gravity. However, they are already sorted with the inner query and we can easily add missing columns in c1
making it:
SELECT c1.creative_id,c1.creative_title,c1.creative_image_name,c1.gravity, c1.ad_strength
FROM creatives AS c1
WHERE
( SELECT COUNT(*)
FROM term_relationships
WHERE c1.creative_id = creative_id
AND term_id IN ( 14, 1, 50, 76, 104 )
) = 5
ORDER BY c1.gravity ASC
LIMIT 30;
Is my understanding correct or am I missing something in your query?
Temp table and filesort are not the villains, per se. It's how bulky they are.
This may look more complex, but it may be faster:
SELECT c3.creative_id,
c3.creative_title, c3.creative_image_name,
c3.gravity, c3.ad_strength
FROM
( SELECT creative_id
FROM creatives AS c1
WHERE
( SELECT COUNT(*)
FROM term_relationships
WHERE c1.creative_id = creative_id
AND term_id IN ( 14, 1, 50, 76, 104 )
) = 5
ORDER BY c1.gravity ASC
LIMIT 30
) AS c2
JOIN creatives c3 USING (creative_id)
ORDER BY c3.gravity
If it happens to use INDEX(gravity)
for the inner query, then it will stop after finding 30 rows that have all 5 transactions. If it generates a tmp table, it will be only 30 rows -- much better than with your original query. Note also, that the tmp table will be narrower -- only creative_id
will be in it. Finally it reaches back into creatives
to get the rest of the desired columns. Finally, there will be another sort, but with only 30 rows.
Furthermore, "filesort" is often a very fast sort in RAM, not really a "file" sort. I'm pretty sure my query will not be on disk.
term_relationships
needs this composite index: INDEX(creative_id, term_id)
.