Search code examples
mysqlindexingclustered-index

Why can MySQL not use a partial primary key index?


The MySQL documentation describing the use of index extensions, gives the following table as an example, followed by the query below:

CREATE TABLE t1 (
    i1 INT NOT NULL DEFAULT 0,
    i2 INT NOT NULL DEFAULT 0,
    d DATE DEFAULT NULL,
    PRIMARY KEY (i1, i2),
    INDEX k_d (d)
) ENGINE = InnoDB;

SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';

InnoDB internally will convert the index k_d to include the primary key at the end. That is, the actual index k_d will be on (d, i1, i2), three columns.

The documentation goes on to explain that (emphasis mine):

The optimizer cannot use the primary key in this case because that comprises columns (i1, i2) and the query does not refer to i2. Instead, the optimizer can use the secondary index k_d on (d), and the execution plan depends on whether the extended index is used.

I am confused by the above statement. First it says that i1 is not enough to use the primary key index of two columns (i1, i2). Then, in the second sentence, it says that the index k_d on (d, i1, i2) can be used, despite that only d and i1 are being used, with i2 absent.

My general understanding of indices in MySQL, and in other flavors of SQL, is that a left portion of an index can be used if a subset of all columns in the index are present, starting from the left.

What is different about a primary key (clustered) index and a non clustered secondary index which allows the latter to use a partial index, but the former cannot?


Solution

  • The documentation is partially inaccurate on the page you linked to.

    Demo, run on MySQL 5.7.21:

    mysql [localhost] {msandbox} (test) > CREATE TABLE t1 (
        ->     i1 INT NOT NULL DEFAULT 0,
        ->     i2 INT NOT NULL DEFAULT 0,
        ->     d DATE DEFAULT NULL,
        ->     PRIMARY KEY (i1, i2),
        ->     INDEX k_d (d)
        -> ) ENGINE = InnoDB;
    
    mysql [localhost] {msandbox} (test) > explain SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
    +----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
    | id | select_type | table | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
    +----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
    |  1 | SIMPLE      | t1    | NULL       | ref  | PRIMARY,k_d   | PRIMARY | 4       | const |    1 |   100.00 | Using where |
    +----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
    

    This query chooses the PRIMARY index, and you can see the key_len is 4, proving it will use only one of the 32-bit INT columns.

    When using InnoDB tables, MySQL often prefers to use the PRIMARY index (the clustered index) because it's more efficient than using a secondary index.