Search code examples
postgresqlindexingpartitioning

Partition scanning very slow


I have a large table partitioned at two levels, first by time:

CREATE TABLE IF NOT EXISTS research.ratios
(
    security_id integer NOT NULL,
    period_id smallint NOT NULL,
    classification_id smallint NOT NULL,
    dtz timestamp with time zone NOT NULL,
    create_dt timestamp with time zone NOT NULL DEFAULT now(),
    update_dt timestamp with time zone NOT NULL DEFAULT now(),
    ratio_value real,
    latest_record boolean NOT NULL DEFAULT false,
    universe_weight real,
    CONSTRAINT ratios_primarykey PRIMARY KEY (dtz, security_id, period_id),
    CONSTRAINT ratios_classification_id_fkey FOREIGN KEY (classification_id)
        REFERENCES settings.classification (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT ratios_period_id_fkey FOREIGN KEY (period_id)
        REFERENCES settings.periods (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT ratios_security_id_fkey FOREIGN KEY (security_id)
        REFERENCES settings.securitymaster (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
) PARTITION BY RANGE (dtz);

CREATE TABLE IF NOT EXISTS zpart.ratios_y1990 PARTITION OF research.ratios
    FOR VALUES FROM ('1990-01-01 00:00:00+00') TO ('1991-01-01 00:00:00+00')
    PARTITION BY LIST (period_id); 

CREATE TABLE IF NOT EXISTS zpart.ratios_y1991 PARTITION OF research.ratios
    FOR VALUES FROM ('1991-01-01 00:00:00+00') TO ('1992-01-01 00:00:00+00')
    PARTITION BY LIST (period_id); 

etc up to today

Each of those tables is then partitioned by a period_id

CREATE TABLE IF NOT EXISTS zpart.ratios_y1990p1 PARTITION OF zpart.ratios_y1990
    FOR VALUES IN ('1');
CREATE TABLE IF NOT EXISTS zpart.ratios_y1990p11 PARTITION OF zpart.ratios_y1990
    FOR VALUES IN ('11');

etc

Finally, note that the actual tables have a primary key:

CONSTRAINT ratios_primarykey PRIMARY KEY (dtz, security_id, period_id)

It works well for most use cases. However when we are collecting a full history by security_id it takes a long time, and some queries in particular make me think that I have got the wrong indexes.

For example, occasionally we will update all data for a security_id and we want to clear any old data that might have been invalidly added. Below is an example where I've updated a security_id in each valid partition and I want to clear data in any earlier partitions just in case:

delete from research.ratios where security_id=10450 and dtz<'2017-01-01';

In this example, there are 189 tables to check, and none of them have any data. I would have thought that the query would check each primary key index for the 189 tables, see that there are no records for security_id 10450 and then finish. However this action takes over a minute to run which is making me think that my indexes are not what I thought they were.

Here is a cut down version of the explain analyze:

QUERY PLAN
Delete on ratios  (cost=8118.25..4049804.56 rows=0 width=0) (actual time=67117.836..67117.938 rows=0 loops=1)
  Delete on ratios_y1990p1 ratios_1
  Delete on ratios_y1990p9 ratios_2
  ...
  Delete on ratios_y2016p15 ratios_188
  Delete on ratios_y2016p17 ratios_189
  ->  Append  (cost=8118.25..4049804.56 rows=48564 width=10) (actual time=67117.829..67117.931 rows=0 loops=1)
        ->  Bitmap Heap Scan on ratios_y1990p1 ratios_1  (cost=8118.25..9048.95 rows=268 width=10) (actual time=271.742..271.743 rows=0 loops=1)
              Recheck Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
              ->  Bitmap Index Scan on ratios_y1990p1_pkey  (cost=0.00..8118.18 rows=268 width=0) (actual time=271.738..271.738 rows=0 loops=1)
                    Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Bitmap Heap Scan on ratios_y1990p9 ratios_2  (cost=6554.82..7340.83 rows=232 width=10) (actual time=9.389..9.390 rows=0 loops=1)
              Recheck Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
              ->  Bitmap Index Scan on ratios_y1990p9_pkey  (cost=0.00..6554.76 rows=232 width=0) (actual time=9.384..9.384 rows=0 loops=1)
                    Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Bitmap Heap Scan on ratios_y1990p11 ratios_3  (cost=6907.45..7694.32 rows=232 width=10) (actual time=7.762..7.762 rows=0 loops=1)
              Recheck Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
              ->  Bitmap Index Scan on ratios_y1990p11_pkey  (cost=0.00..6907.39 rows=232 width=0) (actual time=7.757..7.757 rows=0 loops=1)
                    Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Bitmap Heap Scan on ratios_y1990p12 ratios_4  (cost=7815.19..8617.27 rows=237 width=10) (actual time=6.257..6.258 rows=0 loops=1)
              Recheck Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
              ->  Bitmap Index Scan on ratios_y1990p12_pkey  (cost=0.00..7815.13 rows=237 width=0) (actual time=6.253..6.253 rows=0 loops=1)
                    Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Seq Scan on ratios_y1990p14 ratios_5  (cost=0.00..7059.28 rows=257 width=10) (actual time=23.991..23.991 rows=0 loops=1)
              Filter: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
              Rows Removed by Filter: 191152
        ->  Seq Scan on ratios_y1990p15 ratios_6  (cost=0.00..6500.28 rows=257 width=10) (actual time=23.207..23.208 rows=0 loops=1)
              Filter: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
              Rows Removed by Filter: 191152
        ->  Seq Scan on ratios_y1990p17 ratios_7  (cost=0.00..7468.97 rows=234 width=10) (actual time=28.048..28.048 rows=0 loops=1)
              Filter: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
              Rows Removed by Filter: 191571

...

        ->  Index Scan using ratios_y2013p11_pkey on ratios_y2013p11 ratios_164  (cost=0.42..23840.35 rows=261 width=10) (actual time=53.111..53.111 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2013p12_pkey on ratios_y2013p12 ratios_165  (cost=0.42..25032.72 rows=265 width=10) (actual time=642.015..642.015 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2013p14_pkey on ratios_y2013p14 ratios_166  (cost=0.42..25227.62 rows=259 width=10) (actual time=87.942..87.943 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2013p15_pkey on ratios_y2013p15 ratios_167  (cost=0.42..24860.51 rows=259 width=10) (actual time=43.079..43.080 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2013p17_pkey on ratios_y2013p17 ratios_168  (cost=0.42..23706.46 rows=258 width=10) (actual time=77.406..77.407 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2014p1_pkey on ratios_y2014p1 ratios_169  (cost=0.42..25432.35 rows=258 width=10) (actual time=112.056..112.056 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2014p9_pkey on ratios_y2014p9 ratios_170  (cost=0.42..23804.65 rows=261 width=10) (actual time=98.315..98.315 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2014p11_pkey on ratios_y2014p11 ratios_171  (cost=0.42..23990.53 rows=261 width=10) (actual time=78.770..78.770 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2014p12_pkey on ratios_y2014p12 ratios_172  (cost=0.42..24980.20 rows=257 width=10) (actual time=298.219..298.219 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2014p14_pkey on ratios_y2014p14 ratios_173  (cost=0.42..25452.75 rows=257 width=10) (actual time=40.636..40.636 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2014p15_pkey on ratios_y2014p15 ratios_174  (cost=0.42..25024.75 rows=260 width=10) (actual time=791.547..791.548 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2014p17_pkey on ratios_y2014p17 ratios_175  (cost=0.42..23898.95 rows=266 width=10) (actual time=48.073..48.073 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2015p1_pkey on ratios_y2015p1 ratios_176  (cost=0.42..25477.92 rows=257 width=10) (actual time=53.338..53.339 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2015p9_pkey on ratios_y2015p9 ratios_177  (cost=0.42..23937.18 rows=260 width=10) (actual time=45.046..45.046 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2015p11_pkey on ratios_y2015p11 ratios_178  (cost=0.42..23910.06 rows=258 width=10) (actual time=46.683..46.683 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2015p12_pkey on ratios_y2015p12 ratios_179  (cost=0.42..25184.07 rows=259 width=10) (actual time=50.504..50.504 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2015p14_pkey on ratios_y2015p14 ratios_180  (cost=0.42..25492.90 rows=258 width=10) (actual time=49.115..49.115 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2015p15_pkey on ratios_y2015p15 ratios_181  (cost=0.42..25011.61 rows=259 width=10) (actual time=375.470..375.470 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2015p17_pkey on ratios_y2015p17 ratios_182  (cost=0.42..23847.07 rows=268 width=10) (actual time=44.908..44.908 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2016p1_pkey on ratios_y2016p1 ratios_183  (cost=0.42..25192.72 rows=258 width=10) (actual time=57.975..57.975 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2016p9_pkey on ratios_y2016p9 ratios_184  (cost=0.42..24203.92 rows=264 width=10) (actual time=47.379..47.379 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2016p11_pkey on ratios_y2016p11 ratios_185  (cost=0.42..23961.60 rows=260 width=10) (actual time=44.402..44.402 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2016p12_pkey on ratios_y2016p12 ratios_186  (cost=0.42..25132.85 rows=262 width=10) (actual time=50.321..50.321 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2016p14_pkey on ratios_y2016p14 ratios_187  (cost=0.42..25185.69 rows=260 width=10) (actual time=46.699..46.699 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2016p15_pkey on ratios_y2016p15 ratios_188  (cost=0.42..25009.06 rows=260 width=10) (actual time=377.000..377.000 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
        ->  Index Scan using ratios_y2016p17_pkey on ratios_y2016p17 ratios_189  (cost=0.42..23785.12 rows=262 width=10) (actual time=45.472..45.472 rows=0 loops=1)
              Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
Planning Time: 66.162 ms
Execution Time: 67119.554 ms

Just confirming: there are zero rows affected by the above.

I'm guessing that I have set up the table/indexes incorrectly. I'm looking for advice on how to manage the partitions/indexes better so that the query can exclude partitions quickly if there are no records for a security_id? Note there are billions of records, and growing fast.

Postgres 14.1


Solution

  • Since the security_id is not the first column in the btree index, it can't just jump to the part of the index where all of the given security_id values are, because there is no such singular place. You would need another index leading with security_id, or rearrange the order of columns in your primary key (not an easy task to do retroactively).