Search code examples
mysqlindexingquery-optimizationcomposite-index

In MySQL, is a leftmost prefix of a composite index just as performant as an non-composite index?


MySQL's documentation says:

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

However, I want to know if there's any performance penalty for using this feature. Suppose I have a large table that stores when a user sees a comment:

CREATE TABLE `comment_views` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` bigint(20) NOT NULL,
  `comment_id` bigint(20) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `comment_views_comment_id_user_id_unique` (`comment_id`,`user_id`),
  KEY `comment_views_user_id_foreign` (`user_id`),
  CONSTRAINT `comment_views_comment_id_foreign` FOREIGN KEY (`comment_id`) REFERENCES `comments` (`id`),
  CONSTRAINT `comment_views_user_id_foreign` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=38821916 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

By specifying a composite key on (comment_id, user_id), and a non-composite key on user_id, I am covered to search by comment_id using the leftmost prefix rule. However, if there was an explicit key on just comment_id, would lookups that just specify comment_id (i.e. SELECT * FROM comment_views WHERE comment_id = 123) be faster?

Does that answer change if the composite key wasn't unique, the column is a string, or if the column is nullable? How about if I am instead using a range, join, dependent subquery, covering index query, or some other index access?

I tried running EXPLAIN with and without the extra index, and the explain didn't change other than the possible_keys showing my new index. In fact, it seemed like the EXPLAIN preferred the composite key. Maybe because it was first in the table definition or it just prefers a unique key (even though the leftmost prefix isn't unique). I don't know. I was unable to find more information on how MySQL actually achieves using the leftmost prefix under the hood.


Solution

  • The difference is so minor as to be ignorable.

    Debating points:

    • col1 INT, col2 INT -- once the index pages are cached, they are readily available for queries needing either INDEX(col1) or INDEX(col1 col2). This will sometimes be an advantage.
    • co11 INT, col2 VARCHAR(255) -- Now the composite index is bulk. So when you need only col1, it could be slower.
    • col1 TINYINT, col2 ... -- If col1's cardinality is low (e.g., true/false), INDEX(col1) will probably never be used, but INDEX(col1, col2) could be quite useful.
    • INDEX(col1), INDEX(col1, col2) obviously takes more disk space.

    Other notes:

    • Very rarely is BIGINT needed. For example, might you really have 2 billion users?
    • Is id really needed? Or would PRIMARY KEY(user_id, comment_id) suffice? (Or the opposite order.) That may help for several reasons.
    • Declaring a FOREIGN KEY will create an INDEX if needed. So, create the composite indexes before declaring the FKs.
    • We need to see the queries to make further suggestions.