Search code examples
sqlfirebirdfirebird2.5

Stored Procedure for batch delete in Firebird


I need to delete a bunch of records (literally millions) but I don't want to make it in an individual statement, because of performance issues. So I created a view:

CREATE VIEW V1 
AS 
    SELECT FIRST 500000 * 
    FROM TABLE 
    WHERE W_ID = 14

After that I do a bunch deletes for example:

DELETE FROM V1 WHERE TS < 2021-01-01 

What I want is to import this logic in a While loop and in stored procedure. I tried SELECT COUNT query like this:

SELECT COUNT(*) 
FROM TABLE 
WHERE W_ID = 14 AND TS < 2021-01-01;

Can I use this number in the same procedure as a condition and how can I manage that?

This is what I have tried and I get an error

ERROR: Dynamic SQL Error; SQL error code = -104; Token unknown; WHILE

Code:

CREATE PROCEDURE DeleteBatch
  AS
  DECLARE VARIABLE CNT INT;
  BEGIN
       SELECT COUNT(*) FROM TABLE WHERE W_ID = 14 AND TS < 2021-01-01 INTO :cnt;
        WHILE cnt > 0 do
         BEGIN
          IF (cnt > 0) THEN
          DELETE FROM V1 WHERE TS < 2021-01-01;
         END
     ELSE break;
  END

I just can't wrap my head around this.

To clarify, in my previous question I wanted to know how to manage the garbage_collection after many deleted records, and I did what was suggested - SELECT * FROM TABLE; or gfix -sweep and that worked very well. As mentioned in the comments the correct statement is SELECT COUNT(*) FROM TABLE;

After that another even bigger database was given to me - above 50 million. And the problem was the DB was very slow to operate with. And I managed to get the server it was on, killed with a DELETE statement to clean the database.

That's why I wanted to try deleting in batches. The slow-down problem there was purely hardware - HDD has gone, and we replaced it. After that there was no problem with executing statements and doing backup and restore to reclaim disk space.


Solution

  • because of performance issues

    What are those exactly? I do not think you actually are improving performance, by just running delete in loops but within the same transaction, or even different TXs but within the same timespan. You seem to be solving some wrong problem. The issue is not how you create "garbage", but how and when Firebird "collects" it.

    For example, Select Count(*) in Interbase/Firebird engines means natural scan over all the table and the garbage collection is often trigggered by it, which can itself get long if lot of garbage was created (and massive delete surely does, no matter if done by one million-rows statement or million of one-row statements).

    How to delete large data from Firebird SQL database

    If you really want to slow down deletion - you have to spread that activity round the clock, and make your client application call a deleting SP for example once every 15 minutes. You would have to add some column to the table, flagging it is marked for deletion and then do the job like that

    CREATE PROCEDURE DeleteBatch(CNT INT)
    AS
    DECLARE ROW_ID INTEGER;
    BEGIN
      FOR SELECT ID FROM TABLENAME WHERE MARKED_TO_DEL > 0 INTO :row_id
      DO BEGIN
         CNT = CNT - 1;
         DELETE FROM TABLENAME WHERE ID = :ROW_ID;
         IF (CNT <= 0) THEN LEAVE;  
      END
      SELECT COUNT(1) FROM TABLENAME INTO :ROW_id; /* force GC now */
    END
    

    ...and every 15 minutes you do EXECUTE PROCEDURE DeleteBatch(1000).

    Overall this probably would only be slower, because of single-row "precision targeting" - but at least it would spread the delays.