I am trying to speed up a delete query that appears to be very slow when compared to an identical select query:
Slow delete query:
https://explain.depesz.com/s/kkWJ
delete from processed.token_utxo
where token_utxo.output_tx_time >= (select '2022-03-01T00:00:00+00:00'::timestamp with time zone)
and token_utxo.output_tx_time < (select '2022-03-02T00:00:00+00:00'::timestamp with time zone)
and not exists (
select 1
from public.ma_tx_out
where ma_tx_out.id = token_utxo.id
)
Fast select query: https://explain.depesz.com/s/Bp8q
select * from processed.token_utxo
where token_utxo.output_tx_time >= (select '2022-03-01T00:00:00+00:00'::timestamp with time zone)
and token_utxo.output_tx_time < (select '2022-03-02T00:00:00+00:00'::timestamp with time zone)
and not exists (
select 1
from public.ma_tx_out
where ma_tx_out.id = token_utxo.id
)
Table reference:
create table processed.token_utxo (
id bigint,
tx_out_id bigint,
token_id bigint,
output_tx_id bigint,
output_tx_index int,
output_tx_time timestamp,
input_tx_id bigint,
input_tx_time timestamp,
address varchar,
address_has_script boolean,
payment_cred bytea,
redeemer_id bigint,
stake_address_id bigint,
quantity numeric,
primary key (id)
);
create index token_utxo_output_tx_id on processed.token_utxo using btree (output_tx_id);
create index token_utxo_input_tx_id on processed.token_utxo using btree (input_tx_id);
create index token_utxo_output_tx_time on processed.token_utxo using btree (output_tx_time);
create index token_utxo_input_tx_time on processed.token_utxo using btree (input_tx_time);
create index token_utxo_address on processed.token_utxo using btree (address);
create index token_utxo_token_id on processed.token_utxo using btree (token_id);
Version: PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit
Postgres chooses different query plans which results in drastically different performance. I'm not familiar enough with Postgres to understand why it makes this decision. Hoping there is a simple way to guide it towards a better plan here.
I am not exactly sure what triggers the switch of query plan between SELECT
and DELETE
, but I do know this: the subqueries returning a constant value are actively unhelpful. Use instead:
SELECT *
FROM processed.token_utxo t
WHERE t.output_tx_time >= '2022-03-01T00:00:00+00:00'::timestamptz -- no subquery
AND t.output_tx_time < '2022-03-02T00:00:00+00:00'::timestamptz -- no subquery
AND NOT EXISTS (SELECT FROM public.ma_tx_out m WHERE m.id = t.id)
DELETE FROM processed.token_utxo t
WHERE t.output_tx_time >= '2022-03-01T00:00:00+00:00'::timestamptz
AND t.output_tx_time < '2022-03-02T00:00:00+00:00'::timestamptz
AND NOT EXISTS (SELECT FROM public.ma_tx_out m WHERE m.id = t.id)
As you can see in the query plan, Postgres comes up with a generic plan for yet unknown timestamps:
Index Cond: ((output_tx_time >= $0) AND (output_tx_time < $1))
My fixed query allows Postgres to devise a plan for the actual given constant values. If your column statistics are up to date, this allows for more optimization according to the number of rows expected to qualify for that time interval. The query plan will change to:
Index Cond: ((output_tx_time >= '2022-03-01T00:00:00+00:00'::timestamp with time zone) AND (output_tx_time < '2022-03-02T00:00:00+00:00'::timestamp with time zone))
And you will see different row estimates, that may result in a different query plan.
Of course, DELETE
cannot have the exact same plan. Besides the obvious difference that DELETE
has write-lock and write to dying rows, it also cannot (currently - up to at least pg 15) use parallelism, and it cannot use index-only scans. See:
So you'll see an index scan where SELECT
might use an index-only scan.