I'm working on MySQL 5.5.29-0ubuntu0.12.04.1.
I have the need to create a query that can sort results by date and by a score.
I read the documentation and the posts here on stackoverflow (specifically this) about how to optimize a query but I'm still struggling to do it well. The key findings is that to avoid the use of a temporary table the ORDER BY or GROUP BY must contains only columns from the first table in the join queue, so that's why the use of the STRAIGHT_JOIN clause and the two slightly different queries.
To avoid confusion, I'm going to assign a number to various query configuration:
Following is query 1, takes about 2.5 seconds to complete:
SELECT STRAIGHT_JOIN item.id AS id
FROM item
INNER JOIN score ON item.id = score.item_id
LEFT JOIN url ON item.url_id = url.id
LEFT JOIN doc ON url.doc_id = doc.id
INNER JOIN feed ON feed.id = item.feed_id
INNER JOIN user_feed ON feed.id = user_feed.feed_id AND score.user_id = user_feed.user_id
LEFT JOIN star ON item.id = star.item_id AND score.user_id = star.user_id
JOIN unseen ON item.id = unseen.item_id AND score.user_id = unseen.user_id
WHERE score.user_id = 1 AND user_feed.id = 7
ORDER BY zen_time DESC
LIMIT 0, 10
Following is query 2 (first join tables are inverted and the ordering column is different), takes only about 0.01 seconds to complete:
SELECT STRAIGHT_JOIN item.id AS id
FROM score
INNER JOIN item ON item.id = score.item_id
LEFT JOIN url ON item.url_id = url.id
LEFT JOIN doc ON url.doc_id = doc.id
INNER JOIN feed ON feed.id = item.feed_id
INNER JOIN user_feed ON feed.id = user_feed.feed_id AND score.user_id = user_feed.user_id
LEFT JOIN star ON item.id = star.item_id AND score.user_id = star.user_id
JOIN unseen ON item.id = unseen.item_id AND score.user_id = unseen.user_id
WHERE score.user_id = 1 AND user_feed.id = 7
ORDER BY score DESC
LIMIT 0, 10
Following are the EXPLAIN results for the queries.
Explain for query 1:
Explain for query 2:
Explain for query 3:
Explain for query 4:
Profiler result for query 1:
Profiler result for query 2:
Profiler result for query 3:
Profiler result for query 4:
Following are tables definitions:
CREATE TABLE `doc` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`md5` char(32) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `Md5_index` (`md5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `feed` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`url` text NOT NULL,
`title` text,
PRIMARY KEY (`id`),
FULLTEXT KEY `Title_url_index` (`title`,`url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `item` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`feed_id` bigint(20) unsigned NOT NULL,
`url_id` bigint(20) unsigned DEFAULT NULL,
`md5` char(32) NOT NULL,
PRIMARY KEY (`id`),
KEY `Md5_index` (`md5`),
KEY `Zen_time_index` (`zen_time`),
KEY `Feed_index` (`feed_id`),
KEY `Url_index` (`url_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `score` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
`score` float DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`),
KEY Score_index (`score`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `star` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `unseen` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `url` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`doc_id` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY Doc_index (`doc_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `user` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_Email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `user_feed` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`feed_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `User_feed_index` (`user_id`,`feed_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Here are the row counts for the tables involved in the query:
Score: 68657
Item: 197602
Url: 198354
Doc: 186113
Feed: 754
User_feed: 721
Star: 0
Unseen: 150762
Which approach should I take since my program needs to be able to order results both by zen_time and score in the fastest way possible?
Due to the different query speeds I decided to make an even more accurate analysis based on the various results I want to achieve.
The result sets I need are four:
The query so has to be adapted to those conditions, and its variable parts are:
All of the tests have been executed with the SELECT SQL_NO_CACHE instruction.
Following are the results:
Now it's clear what I have to do: