Search code examples
databasepostgresqlstored-proceduresplpgsql

Delete data from child tables with Stored Procedure


I have a schema named new_schema with a table result. This table has many child tables with inheritance, named like result_23, result_45, etc.

I have 3500 millions of rows and the database coming slow. One row has a timestamp field named new_date. This start from '2022-01-01'. I want to delete all rows before: 2023-11-01.

Deleting from the parent table takes too long. I try to delete from child tables directly in a SP, executed with the postgres superuser. If something happens, I can continue the work later.

CREATE OR REPLACE FUNCTION new_schema.delete_old_rows()

RETURNS TABLE (child_table text)
LANGUAGE plpgsql
AS $function$
DECLARE
    child_table text;
    sql_query text;
BEGIN
    FOR child_table IN
        SELECT table_name
        FROM information_schema.tables
        WHERE table_schema = 'new_schema'
        AND table_name LIKE 'result_%'
    loop
        sql_query := 'DELETE FROM new_schema.' || child_table || ' WHERE new_date < ''2023-11-01'';';
        EXECUTE consulta_sql;
        RAISE NOTICE 'Data deleted in table: %', child_table;
    END LOOP;
END
$function$;

I DELETE from each child table in a loop, and it seems to work (though it takes long for big tables). But when I check with a SELECT, data from 2022 is still there.

When printing the query string, the DELETE looks perfect:

CONTEXT: SQL statement "DELETE FROM new_schema.result_26 WHERE new_date < '2023-11-01';"

I tried to apply COMMIT after each DELETE, but that does not work.


Solution

  • You want a PROCEDURE instead of a FUNCTION. There you can issue COMMIT. See:

    Could work like this:

    CREATE OR REPLACE PROCEDURE new_schema.delete_old_rows()  -- !!!
      -- no RETURNS clause  -- !!!
      LANGUAGE plpgsql AS
    $func$
    DECLARE
       _child text;
       _row_ct int;
    BEGIN
       FOR _child IN
          SELECT quote_ident(table_name)  -- !!!
          FROM   information_schema.tables
          WHERE  table_schema = 'new_schema'
          AND    table_name LIKE 'result_%'
       LOOP
          EXECUTE 'DELETE FROM new_schema.' || _child || ' WHERE new_date < ''2023-11-01''';      
          GET DIAGNOSTICS _row_ct = ROW_COUNT;
          COMMIT;  -- !!!
          RAISE NOTICE '% rows deleted from table: %', _row_ct, _child;
       END LOOP;
    END
    $func$;
    

    Execute with CALL (important!):

    CALL new_schema.delete_old_rows();
    

    Also note that your original was open to SQL injection attacks. Identifiers have to be treated as user input in dynamic SQL. See:

    And you confused variable names: consulta_sql vs. sql_query. I simplified.
    While being at it, I added a row count (practically for free). See: