Search code examples
mysqlsqlquery-optimization

How to optimize a delete query with a subselect?


This query needs to delete over 17 million rows, from a table containing 20 million.

DELETE
FROM statements
WHERE agreement_id IN
    (SELECT id
     FROM agreements
     WHERE created < DATE_SUB(CURDATE(), INTERVAL 6 MONTH));


DELETE
FROM agreements
WHERE created < DATE_SUB(CURDATE(), INTERVAL 6 MONTH)

It takes hours to run, am I missing something that could speed things up a bit?

The subselect by itself takes a few seconds, I don't understand why the delete takes so long.


Solution

  • If you have this much delete to be undertaken. I suggest you to:

    1. create new temporary table with the data which will stay.
    2. Truncate your main table
    3. Move data from temporary table to your main table

    or

    1. create new temporary table with the data which will stay.
    2. Drop your main table
    3. Rename your Temp table as main table (dont forget to create constraints)

    Also for your query,

    never use IN clause for BIG data. Instead use exists which is more performant.

    Basic script:

    CREATE TABLE tmp_statements as
      SELECT * FROM statements s where exists 
      (
         select 1 FROM agreements a 
         WHERE 
           created < DATE_SUB(CURDATE(), INTERVAL 6  MONTH AND
           s.agreement_id = a.agreement_id
      ));
    
     DROP TABLE statements;
    
     RENAME TABLE tmp_statements TO statements ;
    
     --DONT FORGET TO RECREATE  CREATE YOUR INDEXES, CONSTRAINTS;