Search code examples
postgresqlperformancetruncation

30 tables with few rows - TRUNCATE the fastest way to empty them and reset attached sequences?


I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible.

I found myself such kind of solution for MySQL, it performs much faster than just truncation of tables one by one. But anyway, I am interested in the fastest solutions for MySQL too. See my result here, of course it it for MySQL only: https://github.com/bmabey/database_cleaner/issues/126

I have following assumptions:

  • I have 30-100 tables. Let them be 30.
  • Half of the tables are empty.
  • Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large.
  • I need an optional possibility to exclude 2 or 5 or N tables from this procedure.

  • I cannot! use transactions.

I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.

I see the following approaches:

  1. Truncate each table. It is too slow, I think, especially for empty tables.

  2. Check each table for emptiness by more faster method, and then if it is empty, reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence back to 1, otherwise run truncate on it.

I use Ruby code to iterate through all tables, calling code below on each of them, I tried to setup SQL code running against each table like:

DO $$DECLARE r record;
BEGIN
  somehow_captured = SELECT last_value from #{table}_id_seq
  IF (somehow_captured == 1) THEN
    == restore initial unique identifier column value here ==
  END

  IF (somehow_captured > 1) THEN
    TRUNCATE TABLE #{table};
  END IF;
END$$;

Manipulating this code in various aspects, I couldn't make it work, because of I am unfamiliar with PostgreSQL functions and blocks (and variables).

Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the "check procedure" units, cleaning procedure should consist of, but haven't accomplished it too.

I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way.

UPDATE:

I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about "Obviously, you need DatabaseCleaner for PG" and so on and so on.

UPDATE 2:

The strategy described here recently was merged into DatabaseCleaner, https://github.com/bmabey/database_cleaner as :pre_count option (see README there).


Solution

  • If someone is interested with the current strategy, I use for this, see this Ruby-based repo https://github.com/stanislaw/truncate-vs-count for both MySQL and PostgreSQL.

    My results:

    MySQL: the fastest strategy for cleaning databases is truncation with following modifications:

    if table is not empty
      truncate. 
    else 
      if AUTO_INCREMENT is not 0
        truncate.
      end
    end
    
    • For MySQL just truncation is much faster than just deletion. The only case where DELETE wins over TRUNCATE is doing it on empty table.
    • For MySQL truncation with empty checks is much faster than just multiple truncation.
    • For MySQL deletion with empty checks is much faster than just DELETE on each tables.

    PostgreSQL: The fastest strategy for cleaning databases is deletion with the same empty-checks as for MySQL, but with relying on currval instead:

    if table is not empty
      delete table
    else 
      if currval is not 0
        delete table
      end
    end
    
    • For PostgreSQL just deletion is much faster than just TRUNCATION(even multiple).
    • For PostgreSQL multiple TRUNCATE doing empty checks before is slightly faster than just multiple TRUNCATE
    • For PostgreSQL deletion with empty checks is slightly faster than just PostgreSQL deletion.

    This is from where it began: https://github.com/bmabey/database_cleaner/issues/126

    This is the result code and long discussion: https://github.com/bmabey/database_cleaner/pull/127

    This is discussion on pgsql-performance mailing list: http://archives.postgresql.org/pgsql-performance/2012-07/msg00047.php

    We began collecting users feedback proving my idea with first checking empty tables is right.