I created a MyISAM table that stores rows in different partitions based on the year
column, with each year given its own partition, e.g.:
CREATE TABLE t (
id INT UNSIGNED AUTO_INCREMENT
, year SMALLINT(4) UNSIGNED
, ...
, PRIMARY KEY (id, year)
) ENGINE=MyISAM
PARTITION BY LIST (year) (
PARTITION p0 VALUES IN (2000)
, PARTITION p1 VALUES IN (2001)
, ...
);
The question I want to answer is simple; what's the greatest value of year
in the table? In other words:
SELECT MAX(year) FROM t;
It seems, from looking at the table definition, that there should be an execution plan for this query that runs in constant time without reordering the PK or creating a new index on year
. All the database needs to do is find the partition that's defined as having the maximum value, and then check whether any rows exist in that partition; if not, check the next largest, and so on. Unfortunately, the optimizer chooses instead to do a full table scan.
I can easily query INFORMATION_SCHEMA
to find the largest value of year
for which there is a defined partition:
SELECT MAX(PARTITION_DESCRIPTION)
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 't';
However, that only tells me about the partitions defined on the table, not the rows that exist in the table; if some partitions could be empty, it's not necessarily going to give me the MAX(year)
. I tried adding an EXISTS
condition:
SELECT MAX(PARTITION_DESCRIPTION)
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 't'
AND EXISTS(SELECT 1 FROM fact_registration
WHERE year = PARTITION_DESCRIPTION);
This works, but it takes just as long as the full table scan. I also tried to do a partition select in the subquery but the syntax won't accept a user variable or a reference:
SELECT MAX(PARTITION_DESCRIPTION)
FROM
INFORMATION_SCHEMA.PARTITIONS AS P
WHERE
TABLE_NAME = 'fact_registration'
AND EXISTS(SELECT 1 FROM fact_registration
PARTITION(P.PARTITION_DESCRIPTION)); # syntax error, unexpected '.'
So, how can I eliminate the table scan and find the accurate maximum on the table?
It turns out the answer to this was as simple as taking a closer look at the columns in the INFORMATION_SCHEMA.PARTITIONS
table:
TABLE_ROWS
: The number of table rows in the partition.For partitioned
InnoDB
tables, the row count given in theTABLE_ROWS
column is only an estimated value used in SQL optimization, and may not always be exact.
So, for the MyISAM engine in particular, we can find the MAX(year)
without a table scan or any additional indexing by using the following query:
SELECT MAX(PARTITION_DESCRIPTION) AS maxyear
FROM
INFORMATION_SCHEMA.PARTITIONS
WHERE
TABLE_NAME = 'fact_registration'
AND TABLE_ROWS > 0;
Note that this only works in certain situations - if you have more than one value in a partition, the fact that there are rows in the partition doesn't necessarily mean that one or the other value is present; and since PARTITION_DESCRIPTION
is a longtext
column, the integer values you supply will be stored as a comma-separated string.