I have a table
T (A int, B int, C long, D varchar)
partitioned by each A
and sub-partitioned by each B
(i.e. list partitions with a single value each). A
has cardinality of <10 and B
has cardinality of <100. T
has about 6 billion rows.
When I run the query
select distinct B from T where A = 1;
it prunes the top-level partitions (those where A != 1
) but performs a table scan on all sub-partitions to find distinct values of B
. I thought it would know, based on the partition design, that it would only have to check the partition constraint to determine the possible values of B
given A
, but alas, that is not the case.
There are no indexes on A
or B
, but there is a primary key on (C,D)
at each partition, which seems immaterial, but figured I should mention it. I also have a BRIN index on C
. Any idea why the Postgres query planner is not consulting the sub-partition constraints to avoid the table scan?
The reason is that nobody implemented such an optimization in the query planner. I cannot say that that surprises me, since it is a rather unusual query. Every such optimization built into the optimizer would mean that each query on a partitioned table that has a DISTINCT
would need some extra query planning time, while only few queries would profit. Apart from the expense of writing and maintaining the code, that would be a net loss for most users.
Maybe you could use a metadata query:
CREATE TABLE list (id bigint NOT NULL, p integer NOT NULL) PARTITION BY LIST (p);
CREATE TABLE list_42 PARTITION OF list FOR VALUES IN (42);
CREATE TABLE list_101 PARTITION OF list FOR VALUES IN (101);
SELECT regexp_replace(
pg_get_expr(
p.relpartbound,
p.oid
),
'^FOR VALUES IN \((.*)\)$',
'\1'
)::integer
FROM pg_class AS p
JOIN pg_inherits AS i ON p.oid = i.inhrelid
WHERE i.inhparent = 'list'::regclass;
regexp_replace
----------------
42
101
(2 rows)