I have the following tables for storing time series metric data in my PostgreSQL 12 instance -
CREATE TABLE metric_values
(
event_id bigint NOT NULL,
"timestamp" timestamp without time zone NOT NULL,
value numeric(20,4) NOT NULL
)
CREATE TABLE metric_values_old
(
event_id bigint NOT NULL,
"timestamp" timestamp without time zone NOT NULL,
value numeric(20,4) NOT NULL
)
The two tables are identical with index on event_id and timestamp columns. The difference is metric_values_old is having ~50M rows and is clustered based on the above index, while metric_values is partitioned by inheritance, with individual indexes and clustering created on each child.
Each partition is supposed to hold data for 1 year, derived from the timestamp column.
Here is the definition of one the child tables -
CREATE TABLE metric_values_2021
(
-- Inherited from table metric_values: event_id bigint NOT NULL,
-- Inherited from table metric_values: "timestamp" timestamp without time zone NOT NULL,
-- Inherited from table metric_values: value numeric(20,4) NOT NULL,
CONSTRAINT metric_values_2021_event_id_timestamp_key UNIQUE (event_id, "timestamp"),
CONSTRAINT metric_values_2021_timestamp_check CHECK (date_part('year'::text, "timestamp") = 2021)
)
INHERITS (metric_values)
TABLESPACE pg_default;
CREATE INDEX metric_values_2021_idx
ON metric_values_2021 USING btree
(event_id ASC NULLS LAST, "timestamp" ASC NULLS LAST)
TABLESPACE pg_default;
ALTER TABLE metric_values_2021
CLUSTER ON metric_values_2021_idx;
But while comparing the query performance for both these tables, the partitioned table is performing worse than the clustered table. I was expecting the performance to be better since the query has conditions on timestamp, and would be able to retrieve from a particular child. I would prefer to go with partitioning since it will be easier to maintain going forward and each child table will have a fixed size unlike the clustered table with 50M rows and growing.
Below are the queries I tried on both tables -
select event_id, timestamp, value from metric_values
WHERE timestamp between '2020-08-01' and '2020-08-31'
select event_id, timestamp, value from metric_values_old
WHERE timestamp between '2020-08-01' and '2020-08-31'
Following are the explain plans -
It seems for the partitioned table this is scanning all partitions hence the increased cost. Any suggestions on improving this performance will be highly appreciated. Thanks!
Followed the suggestion from @a_horse_with_no_name and was able to resolve the issue for the query -
select event_id, timestamp, value from metric_values
WHERE timestamp between '2020-08-01' and '2020-08-31'
Although the performance is way better, PostgreSQL still seems to scan all partitions. If anybody is aware of the reason or if there is a way to avoid this, do let me know. All your help is much appreciated.
Issue resolved by changing from inheritance to declarative approach. Please refer to the solution suggested by @a_horse_with_no_name here - https://dbfiddle.uk/?rdbms=postgres_13&fiddle=f99f23409e416be31f5bad493083196b