Search code examples
mysqlsqljoinquery-optimization

How to make JOIN query use index?


I have two tables:

CREATE TABLE `articles` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` varchar(1000) DEFAULT NULL,
  `last_updated` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `last_updated` (`last_updated`),
) ENGINE=InnoDB AUTO_INCREMENT=799681 DEFAULT CHARSET=utf8 

CREATE TABLE `article_categories` (
  `article_id` int(11) NOT NULL DEFAULT '0',
  `category_id` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`article_id`,`category_id`),
  KEY `category_id` (`category_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

This is my query:

SELECT a.*
FROM
    articles AS a,
    article_categories AS c
WHERE
    a.id = c.article_id
    AND c.category_id = 78
    AND a.comment_cnt > 0
    AND a.deleted = 0
ORDER BY a.last_updated
LIMIT 100, 20

And an EXPLAIN for it:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: index
possible_keys: PRIMARY
          key: last_updated
      key_len: 9
          ref: NULL
         rows: 2040
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: eq_ref
possible_keys: PRIMARY,fandom_id
          key: PRIMARY
      key_len: 8
          ref: db.a.id,const
         rows: 1
        Extra: Using index

It uses a full index scan of last_updated on the first table for sorting but does not use any index for join (type: index in explain). This is very bad for performance and kills the whole database server since this is a very frequent query.

I've tried reversing table order with STRAIGHT_JOIN, but this gives filesort, using_temporary, which is even worse.

Is there any way to make MySQL use index for joining and for sorting at the same time?

=== update ===

I'm really desperate about this. Maybe some kind of denormalization can help here?


Solution

  • If you have lots of categories, this query cannot be made efficient. No single index can cover two tables at once in MySQL.

    You have to do denormalization: add last_updated, has_comments and deleted into article_categories:

    CREATE TABLE `article_categories` (
      `article_id` int(11) NOT NULL DEFAULT '0',
      `category_id` int(11) NOT NULL DEFAULT '0',
      `last_updated` timestamp NOT NULL,
      `has_comments` boolean NOT NULL,
      `deleted` boolean NOT NULL,
      PRIMARY KEY (`article_id`,`category_id`),
      KEY `category_id` (`category_id`),
      KEY `ix_articlecategories_category_comments_deleted_updated` (category_id, has_comments, deleted, last_updated)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8
    

    and run this query:

    SELECT  *
    FROM    (
            SELECT  article_id
            FROM    article_categories
            WHERE   (category_id, has_comments, deleted) = (78, 1, 0)
            ORDER BY
                    last_updated DESC
            LIMIT   100, 20
            ) q
    JOIN    articles a
    ON      a.id = q.article_id
    

    Of course you should update article_categories as well whenever you update relevant columns in article. This can be done in a trigger.

    Note that the column has_comments is boolean: this will allow using an equality predicate to make a single range scan over the index.

    Also note that the LIMIT goes into the subquery. This makes MySQL use late row lookups which it does not use by default. See this article in my blog about why do they increase performance:

    If you were on SQL Server, you could make an indexable view over your query, which essentially would make a denormalized indexed copy of article_categories with the additional fields, automatically mainained by the server.

    Unfortunately, MySQL does not support this and you will have to create such a table manually and write additional code to keep it in sync with the base tables.