Search code examples
sqlpostgresqlrdbmsdatabase-partitioning

How to improve read performance on a partitioned (using inheritance) PostgreSQL table over the same version of clustered table with ~50M rows


I have the following tables for storing time series metric data in my PostgreSQL 12 instance -

 CREATE TABLE metric_values
(
    event_id bigint NOT NULL,
    "timestamp" timestamp without time zone NOT NULL,
    value numeric(20,4) NOT NULL
)

CREATE TABLE metric_values_old
(
    event_id bigint NOT NULL,
    "timestamp" timestamp without time zone NOT NULL,
    value numeric(20,4) NOT NULL
)

The two tables are identical with index on event_id and timestamp columns. The difference is metric_values_old is having ~50M rows and is clustered based on the above index, while metric_values is partitioned by inheritance, with individual indexes and clustering created on each child.

Each partition is supposed to hold data for 1 year, derived from the timestamp column.

Here is the definition of one the child tables -

CREATE TABLE metric_values_2021
(
    -- Inherited from table metric_values: event_id bigint NOT NULL,
    -- Inherited from table metric_values: "timestamp" timestamp without time zone NOT NULL,
    -- Inherited from table metric_values: value numeric(20,4) NOT NULL,
    CONSTRAINT metric_values_2021_event_id_timestamp_key UNIQUE (event_id, "timestamp"),
    CONSTRAINT metric_values_2021_timestamp_check CHECK (date_part('year'::text, "timestamp") = 2021)
)
    INHERITS (metric_values)
TABLESPACE pg_default;

CREATE INDEX metric_values_2021_idx
    ON metric_values_2021 USING btree
    (event_id ASC NULLS LAST, "timestamp" ASC NULLS LAST)
    TABLESPACE pg_default;

ALTER TABLE metric_values_2021
    CLUSTER ON metric_values_2021_idx;

But while comparing the query performance for both these tables, the partitioned table is performing worse than the clustered table. I was expecting the performance to be better since the query has conditions on timestamp, and would be able to retrieve from a particular child. I would prefer to go with partitioning since it will be easier to maintain going forward and each child table will have a fixed size unlike the clustered table with 50M rows and growing.

Below are the queries I tried on both tables -

select event_id, timestamp, value from metric_values
WHERE timestamp between '2020-08-01' and '2020-08-31'

select event_id, timestamp, value from metric_values_old
WHERE timestamp between '2020-08-01' and '2020-08-31'

Following are the explain plans -

  1. Clustered table with no partitioning

    enter image description here

  2. Partitioned Table

    enter image description here

It seems for the partitioned table this is scanning all partitions hence the increased cost. Any suggestions on improving this performance will be highly appreciated. Thanks!

Followed the suggestion from @a_horse_with_no_name and was able to resolve the issue for the query -

select event_id, timestamp, value from metric_values
WHERE timestamp between '2020-08-01' and '2020-08-31'

enter image description here

Although the performance is way better, PostgreSQL still seems to scan all partitions. If anybody is aware of the reason or if there is a way to avoid this, do let me know. All your help is much appreciated.


Solution

  • Issue resolved by changing from inheritance to declarative approach. Please refer to the solution suggested by @a_horse_with_no_name here - https://dbfiddle.uk/?rdbms=postgres_13&fiddle=f99f23409e416be31f5bad493083196b