I have two tables, one containing activities, the other containing their timeslots. Those timeslots have a "last updated" timestamp and are linked to their respective activities via foreign key. Different activities have different suppliers. There are roughly 500k activities and 5 million timelots. I'm attempting to execute the following query:
DELETE FROM "timelots" USING "activities" WHERE
"activities".inventory_supplier = 'Supplier' AND
"timelots".last_updated < '2022-05-01T00:00:00'::timestamp
And as you can guess from the title, my issue is it's just not finishing in a reasonable time. I waited for 6 hours before giving up.
I've made sure that the relevant fields are indexed (last_updated, inventory_supplier, foreign key) and that I don't have a hardware bottleneck. I've tried deleting a single record from timeslots and making a similar select query instead of delete, and both of those queries executed almost instantaneously.
select * FROM "timeslots" join "activities"
on "timelots".activity_id = "activities".uuid where inventory_supplier = 'Supplier'
AND "timelots".last_updated < '2022-05-01T00:00:00'::timestamp
Running explain I got back this result:
Delete on "timeslots" (cost=0.42..601781578.87 rows=48130075630 width=12)
-> Nested Loop (cost=0.42..601781578.87 rows=48130075630 width=12)
-> Seq Scan on "timeslots" (cost=0.00..111661.24 rows=2683135 width=6)
Filter: (last_updated < '2022-05-01 00:00:00'::timestamp without time zone)
-> Materialize (cost=0.42..44017.10 rows=17938 width=6)
-> Index Scan using activities_invento_4bc27d_idx on "activities" (cost=0.42..43927.41 rows=17938 width=6)
Index Cond: ((inventory_supplier)::text = 'Supplier'::text)
I've looked everywhere for an answer. There are no delete triggers, missing indexes on foreign keys or cascading deletes. Everything seems in order, I'm not sure what I'm missing. Database is running on an AWS RDS instance with 32 gigs of RAM, Postgres version is 11.13
You forgot the join condition in your DELETE
statement. You need to add it as an additional WHERE
condition:
... AND timeslots.activity_id = activities.uuid