I have a scenario like this:
SELECT * FROM PACKAGE WHERE PACKAGE_TYPE IN ('BOX','CARD')
The table is partitioned by PACKAGE_TYPE
field. Assume that there are twenty possible values for PACKAGE_TYPE
field. So there are twenty partitions including BOX
, CARD
and DEFAULT
partitions. When the above query is run, partition elimination happens correctly and only the BOX
and CARD
partitions get scanned. The result is quick.
However, when the same query is written like this:
SELECT * FROM PACKAGE WHERE PACKAGE_TYPE IN (SELECT PACKAGE_TYPE FROM PACKAGE_LIST_TABLE)
, where the column PACKAGE_TYPE
in PACKAGE_LIST_TABLE
contains two values BOX
and CARD
.
When the above query is run, all the 20 partitions are being scanned. It degrades the performance.
It seems that the compiler is failing to identify the second query correctly and as a result all the partitions are getting accessed.
Any workarounds to overcome this?
Thanks in advance.
The Postgres manual page on Partitioning includes this caveat
Constraint exclusion only works when the query's WHERE clause contains constants (or externally supplied parameters). For example, a comparison against a non-immutable function such as CURRENT_TIMESTAMP cannot be optimized, since the planner cannot know which partition the function value might fall into at run time.
In order to eliminate a seek on a partition, Postgres must know when creating a query plan that no rows from that partition are relevant. In your query, this occurs only after the sub-query has completed, so the query would have to be split into two, with the second part planned only after the first completes.
If the partitions include an index on the partitioned column (PACKAGE_TYPE
) as well as a constraint, the planner may elect to use an index scan on each partition, leading to the incorrect partitions being reasonably efficiently eliminated at runtime anyway. (That is, there would be 20 index scans, but each would require very little resource.)
An alternative would be to split the query yourself, and build the SQL dynamically. Since the SELECT PACKAGE_TYPE FROM PACKAGE_LIST_TABLE
can only ever return up to 20 distinct values, you could select those into an array/set in your application or a user-defined function. Then you can pass these in as literals in the IN ( ... )
clause as in your first example (or equivalently = ANY(array_expression)
), and achieve the partition elimination.