Search code examples
mysqldatabase-partitioning

In a table partitioned BY LIST (n), how can I quickly SELECT MAX(n) without an index?


I created a MyISAM table that stores rows in different partitions based on the year column, with each year given its own partition, e.g.:

CREATE TABLE t (
  id INT UNSIGNED AUTO_INCREMENT
, year SMALLINT(4) UNSIGNED
, ...
, PRIMARY KEY (id, year)
) ENGINE=MyISAM
PARTITION BY LIST (year) (
  PARTITION p0 VALUES IN (2000)
, PARTITION p1 VALUES IN (2001)
, ...
);

The question I want to answer is simple; what's the greatest value of year in the table? In other words:

SELECT MAX(year) FROM t;

It seems, from looking at the table definition, that there should be an execution plan for this query that runs in constant time without reordering the PK or creating a new index on year. All the database needs to do is find the partition that's defined as having the maximum value, and then check whether any rows exist in that partition; if not, check the next largest, and so on. Unfortunately, the optimizer chooses instead to do a full table scan.

I can easily query INFORMATION_SCHEMA to find the largest value of year for which there is a defined partition:

SELECT MAX(PARTITION_DESCRIPTION)
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 't';

However, that only tells me about the partitions defined on the table, not the rows that exist in the table; if some partitions could be empty, it's not necessarily going to give me the MAX(year). I tried adding an EXISTS condition:

SELECT MAX(PARTITION_DESCRIPTION)
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 't'
    AND EXISTS(SELECT 1 FROM fact_registration 
               WHERE year = PARTITION_DESCRIPTION);

This works, but it takes just as long as the full table scan. I also tried to do a partition select in the subquery but the syntax won't accept a user variable or a reference:

SELECT MAX(PARTITION_DESCRIPTION)
FROM
    INFORMATION_SCHEMA.PARTITIONS AS P
WHERE
    TABLE_NAME = 'fact_registration'
        AND EXISTS(SELECT 1 FROM fact_registration 
                   PARTITION(P.PARTITION_DESCRIPTION));  # syntax error, unexpected '.'

So, how can I eliminate the table scan and find the accurate maximum on the table?


Solution

  • It turns out the answer to this was as simple as taking a closer look at the columns in the INFORMATION_SCHEMA.PARTITIONS table:

    TABLE_ROWS: The number of table rows in the partition.

    For partitioned InnoDB tables, the row count given in the TABLE_ROWS column is only an estimated value used in SQL optimization, and may not always be exact.

    So, for the MyISAM engine in particular, we can find the MAX(year) without a table scan or any additional indexing by using the following query:

    SELECT MAX(PARTITION_DESCRIPTION) AS maxyear
    FROM
        INFORMATION_SCHEMA.PARTITIONS
    WHERE
        TABLE_NAME = 'fact_registration'
            AND TABLE_ROWS > 0;
    

    Note that this only works in certain situations - if you have more than one value in a partition, the fact that there are rows in the partition doesn't necessarily mean that one or the other value is present; and since PARTITION_DESCRIPTION is a longtext column, the integer values you supply will be stored as a comma-separated string.