I have a large table partitioned at two levels, first by time:
CREATE TABLE IF NOT EXISTS research.ratios
(
security_id integer NOT NULL,
period_id smallint NOT NULL,
classification_id smallint NOT NULL,
dtz timestamp with time zone NOT NULL,
create_dt timestamp with time zone NOT NULL DEFAULT now(),
update_dt timestamp with time zone NOT NULL DEFAULT now(),
ratio_value real,
latest_record boolean NOT NULL DEFAULT false,
universe_weight real,
CONSTRAINT ratios_primarykey PRIMARY KEY (dtz, security_id, period_id),
CONSTRAINT ratios_classification_id_fkey FOREIGN KEY (classification_id)
REFERENCES settings.classification (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT ratios_period_id_fkey FOREIGN KEY (period_id)
REFERENCES settings.periods (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT ratios_security_id_fkey FOREIGN KEY (security_id)
REFERENCES settings.securitymaster (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
) PARTITION BY RANGE (dtz);
CREATE TABLE IF NOT EXISTS zpart.ratios_y1990 PARTITION OF research.ratios
FOR VALUES FROM ('1990-01-01 00:00:00+00') TO ('1991-01-01 00:00:00+00')
PARTITION BY LIST (period_id);
CREATE TABLE IF NOT EXISTS zpart.ratios_y1991 PARTITION OF research.ratios
FOR VALUES FROM ('1991-01-01 00:00:00+00') TO ('1992-01-01 00:00:00+00')
PARTITION BY LIST (period_id);
etc up to today
Each of those tables is then partitioned by a period_id
CREATE TABLE IF NOT EXISTS zpart.ratios_y1990p1 PARTITION OF zpart.ratios_y1990
FOR VALUES IN ('1');
CREATE TABLE IF NOT EXISTS zpart.ratios_y1990p11 PARTITION OF zpart.ratios_y1990
FOR VALUES IN ('11');
etc
Finally, note that the actual tables have a primary key:
CONSTRAINT ratios_primarykey PRIMARY KEY (dtz, security_id, period_id)
It works well for most use cases. However when we are collecting a full history by security_id it takes a long time, and some queries in particular make me think that I have got the wrong indexes.
For example, occasionally we will update all data for a security_id and we want to clear any old data that might have been invalidly added. Below is an example where I've updated a security_id in each valid partition and I want to clear data in any earlier partitions just in case:
delete from research.ratios where security_id=10450 and dtz<'2017-01-01';
In this example, there are 189 tables to check, and none of them have any data. I would have thought that the query would check each primary key index for the 189 tables, see that there are no records for security_id 10450 and then finish. However this action takes over a minute to run which is making me think that my indexes are not what I thought they were.
Here is a cut down version of the explain analyze:
QUERY PLAN
Delete on ratios (cost=8118.25..4049804.56 rows=0 width=0) (actual time=67117.836..67117.938 rows=0 loops=1)
Delete on ratios_y1990p1 ratios_1
Delete on ratios_y1990p9 ratios_2
...
Delete on ratios_y2016p15 ratios_188
Delete on ratios_y2016p17 ratios_189
-> Append (cost=8118.25..4049804.56 rows=48564 width=10) (actual time=67117.829..67117.931 rows=0 loops=1)
-> Bitmap Heap Scan on ratios_y1990p1 ratios_1 (cost=8118.25..9048.95 rows=268 width=10) (actual time=271.742..271.743 rows=0 loops=1)
Recheck Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Bitmap Index Scan on ratios_y1990p1_pkey (cost=0.00..8118.18 rows=268 width=0) (actual time=271.738..271.738 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Bitmap Heap Scan on ratios_y1990p9 ratios_2 (cost=6554.82..7340.83 rows=232 width=10) (actual time=9.389..9.390 rows=0 loops=1)
Recheck Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Bitmap Index Scan on ratios_y1990p9_pkey (cost=0.00..6554.76 rows=232 width=0) (actual time=9.384..9.384 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Bitmap Heap Scan on ratios_y1990p11 ratios_3 (cost=6907.45..7694.32 rows=232 width=10) (actual time=7.762..7.762 rows=0 loops=1)
Recheck Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Bitmap Index Scan on ratios_y1990p11_pkey (cost=0.00..6907.39 rows=232 width=0) (actual time=7.757..7.757 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Bitmap Heap Scan on ratios_y1990p12 ratios_4 (cost=7815.19..8617.27 rows=237 width=10) (actual time=6.257..6.258 rows=0 loops=1)
Recheck Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Bitmap Index Scan on ratios_y1990p12_pkey (cost=0.00..7815.13 rows=237 width=0) (actual time=6.253..6.253 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Seq Scan on ratios_y1990p14 ratios_5 (cost=0.00..7059.28 rows=257 width=10) (actual time=23.991..23.991 rows=0 loops=1)
Filter: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
Rows Removed by Filter: 191152
-> Seq Scan on ratios_y1990p15 ratios_6 (cost=0.00..6500.28 rows=257 width=10) (actual time=23.207..23.208 rows=0 loops=1)
Filter: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
Rows Removed by Filter: 191152
-> Seq Scan on ratios_y1990p17 ratios_7 (cost=0.00..7468.97 rows=234 width=10) (actual time=28.048..28.048 rows=0 loops=1)
Filter: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
Rows Removed by Filter: 191571
...
-> Index Scan using ratios_y2013p11_pkey on ratios_y2013p11 ratios_164 (cost=0.42..23840.35 rows=261 width=10) (actual time=53.111..53.111 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2013p12_pkey on ratios_y2013p12 ratios_165 (cost=0.42..25032.72 rows=265 width=10) (actual time=642.015..642.015 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2013p14_pkey on ratios_y2013p14 ratios_166 (cost=0.42..25227.62 rows=259 width=10) (actual time=87.942..87.943 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2013p15_pkey on ratios_y2013p15 ratios_167 (cost=0.42..24860.51 rows=259 width=10) (actual time=43.079..43.080 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2013p17_pkey on ratios_y2013p17 ratios_168 (cost=0.42..23706.46 rows=258 width=10) (actual time=77.406..77.407 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2014p1_pkey on ratios_y2014p1 ratios_169 (cost=0.42..25432.35 rows=258 width=10) (actual time=112.056..112.056 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2014p9_pkey on ratios_y2014p9 ratios_170 (cost=0.42..23804.65 rows=261 width=10) (actual time=98.315..98.315 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2014p11_pkey on ratios_y2014p11 ratios_171 (cost=0.42..23990.53 rows=261 width=10) (actual time=78.770..78.770 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2014p12_pkey on ratios_y2014p12 ratios_172 (cost=0.42..24980.20 rows=257 width=10) (actual time=298.219..298.219 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2014p14_pkey on ratios_y2014p14 ratios_173 (cost=0.42..25452.75 rows=257 width=10) (actual time=40.636..40.636 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2014p15_pkey on ratios_y2014p15 ratios_174 (cost=0.42..25024.75 rows=260 width=10) (actual time=791.547..791.548 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2014p17_pkey on ratios_y2014p17 ratios_175 (cost=0.42..23898.95 rows=266 width=10) (actual time=48.073..48.073 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2015p1_pkey on ratios_y2015p1 ratios_176 (cost=0.42..25477.92 rows=257 width=10) (actual time=53.338..53.339 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2015p9_pkey on ratios_y2015p9 ratios_177 (cost=0.42..23937.18 rows=260 width=10) (actual time=45.046..45.046 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2015p11_pkey on ratios_y2015p11 ratios_178 (cost=0.42..23910.06 rows=258 width=10) (actual time=46.683..46.683 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2015p12_pkey on ratios_y2015p12 ratios_179 (cost=0.42..25184.07 rows=259 width=10) (actual time=50.504..50.504 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2015p14_pkey on ratios_y2015p14 ratios_180 (cost=0.42..25492.90 rows=258 width=10) (actual time=49.115..49.115 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2015p15_pkey on ratios_y2015p15 ratios_181 (cost=0.42..25011.61 rows=259 width=10) (actual time=375.470..375.470 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2015p17_pkey on ratios_y2015p17 ratios_182 (cost=0.42..23847.07 rows=268 width=10) (actual time=44.908..44.908 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2016p1_pkey on ratios_y2016p1 ratios_183 (cost=0.42..25192.72 rows=258 width=10) (actual time=57.975..57.975 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2016p9_pkey on ratios_y2016p9 ratios_184 (cost=0.42..24203.92 rows=264 width=10) (actual time=47.379..47.379 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2016p11_pkey on ratios_y2016p11 ratios_185 (cost=0.42..23961.60 rows=260 width=10) (actual time=44.402..44.402 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2016p12_pkey on ratios_y2016p12 ratios_186 (cost=0.42..25132.85 rows=262 width=10) (actual time=50.321..50.321 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2016p14_pkey on ratios_y2016p14 ratios_187 (cost=0.42..25185.69 rows=260 width=10) (actual time=46.699..46.699 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2016p15_pkey on ratios_y2016p15 ratios_188 (cost=0.42..25009.06 rows=260 width=10) (actual time=377.000..377.000 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
-> Index Scan using ratios_y2016p17_pkey on ratios_y2016p17 ratios_189 (cost=0.42..23785.12 rows=262 width=10) (actual time=45.472..45.472 rows=0 loops=1)
Index Cond: ((dtz < '2017-01-01 00:00:00+00'::timestamp with time zone) AND (security_id = 10450))
Planning Time: 66.162 ms
Execution Time: 67119.554 ms
Just confirming: there are zero rows affected by the above.
I'm guessing that I have set up the table/indexes incorrectly. I'm looking for advice on how to manage the partitions/indexes better so that the query can exclude partitions quickly if there are no records for a security_id? Note there are billions of records, and growing fast.
Postgres 14.1
Since the security_id is not the first column in the btree index, it can't just jump to the part of the index where all of the given security_id values are, because there is no such singular place. You would need another index leading with security_id, or rearrange the order of columns in your primary key (not an easy task to do retroactively).