sql postgresql sql-execution-plan postgresql-performance

Identical SELECT vs DELETE query creates different query plans with vastly different execution time

I am trying to speed up a delete query that appears to be very slow when compared to an identical select query:

Slow delete query:
https://explain.depesz.com/s/kkWJ

delete from processed.token_utxo
where token_utxo.output_tx_time >= (select '2022-03-01T00:00:00+00:00'::timestamp with time zone)
  and token_utxo.output_tx_time < (select '2022-03-02T00:00:00+00:00'::timestamp with time zone)
  and not exists (
    select 1
    from public.ma_tx_out
    where ma_tx_out.id = token_utxo.id
  )

Fast select query: https://explain.depesz.com/s/Bp8q

select * from processed.token_utxo
where token_utxo.output_tx_time >= (select '2022-03-01T00:00:00+00:00'::timestamp with time zone)
  and token_utxo.output_tx_time < (select '2022-03-02T00:00:00+00:00'::timestamp with time zone)
  and not exists (
    select 1
    from public.ma_tx_out
    where ma_tx_out.id = token_utxo.id
  )

Table reference:

create table processed.token_utxo (
  id bigint,
  tx_out_id bigint,
  token_id bigint,
  output_tx_id bigint,
  output_tx_index int,
  output_tx_time timestamp,
  input_tx_id bigint,
  input_tx_time timestamp,
  address varchar,
  address_has_script boolean,
  payment_cred bytea,
  redeemer_id bigint,
  stake_address_id bigint,
  quantity numeric,
  primary key (id)
);

create index token_utxo_output_tx_id on processed.token_utxo using btree (output_tx_id);
create index token_utxo_input_tx_id on processed.token_utxo using btree (input_tx_id);
create index token_utxo_output_tx_time on processed.token_utxo using btree (output_tx_time);
create index token_utxo_input_tx_time on processed.token_utxo using btree (input_tx_time);
create index token_utxo_address on processed.token_utxo using btree (address);
create index token_utxo_token_id on processed.token_utxo using btree (token_id);

Version: PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit

Postgres chooses different query plans which results in drastically different performance. I'm not familiar enough with Postgres to understand why it makes this decision. Hoping there is a simple way to guide it towards a better plan here.

Solution

I am not exactly sure what triggers the switch of query plan between SELECT and DELETE, but I do know this: the subqueries returning a constant value are actively unhelpful. Use instead:

SELECT *
FROM   processed.token_utxo t
WHERE  t.output_tx_time >= '2022-03-01T00:00:00+00:00'::timestamptz  -- no subquery
AND    t.output_tx_time <  '2022-03-02T00:00:00+00:00'::timestamptz  -- no subquery
AND    NOT EXISTS (SELECT FROM public.ma_tx_out m WHERE m.id = t.id)

DELETE FROM processed.token_utxo t
WHERE t.output_tx_time >= '2022-03-01T00:00:00+00:00'::timestamptz
AND   t.output_tx_time <  '2022-03-02T00:00:00+00:00'::timestamptz
AND   NOT EXISTS (SELECT FROM public.ma_tx_out m WHERE m.id = t.id)

As you can see in the query plan, Postgres comes up with a generic plan for yet unknown timestamps:

Index Cond: ((output_tx_time >= $0) AND (output_tx_time < $1))

My fixed query allows Postgres to devise a plan for the actual given constant values. If your column statistics are up to date, this allows for more optimization according to the number of rows expected to qualify for that time interval. The query plan will change to:

Index Cond: ((output_tx_time >= '2022-03-01T00:00:00+00:00'::timestamp with time zone) AND (output_tx_time < '2022-03-02T00:00:00+00:00'::timestamp with time zone))

And you will see different row estimates, that may result in a different query plan.

Of course, DELETE cannot have the exact same plan. Besides the obvious difference that DELETE has write-lock and write to dying rows, it also cannot (currently - up to at least pg 15) use parallelism, and it cannot use index-only scans. See:

Delete using another table with index

So you'll see an index scan where SELECT might use an index-only scan.