Search code examples
sqlpostgresqlanti-join

SQL Anti-join Delete Optimisation


I have two tables in a postgres database, posts and users. posts has a user_id foreign key that references the users.id primary key column. Both tables are very large.

I have just deleted a random set of users (about 80% of the total users) and I want to delete all posts that reference deleted users, effectively an anti-join and delete. What is the most efficient way of doing this?

Currently I have this:

DELETE FROM posts l
WHERE NOT EXISTS
  (
     SELECT NULL
     FROM users r
     WHERE r.id = l.user_id
  )

Is there a more efficient way of doing this?


Solution

  • If you want to delete 80% of users, then the fastest way is probably:

    create table temp_posts as 
        select p.*
        from posts p
        where exists (select 1 from users u where u.id = p.user_id);
    
    truncate table posts;
    
    insert into posts
        select *
        from temp_posts;
    

    Batch inserts are much less work than updating most of the rows in the table. Of course, you should test this carefully. Truncating the table is a fast way to remove all the rows from it.