I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible.
I found myself such kind of solution for MySQL, it performs much faster than just truncation of tables one by one. But anyway, I am interested in the fastest solutions for MySQL too. See my result here, of course it it for MySQL only: https://github.com/bmabey/database_cleaner/issues/126
I have following assumptions:
I need an optional possibility to exclude 2 or 5 or N tables from this procedure.
I cannot! use transactions.
I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.
I see the following approaches:
Truncate each table. It is too slow, I think, especially for empty tables.
Check each table for emptiness by more faster method, and then if it is empty, reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence back to 1, otherwise run truncate on it.
I use Ruby code to iterate through all tables, calling code below on each of them, I tried to setup SQL code running against each table like:
DO $$DECLARE r record;
BEGIN
somehow_captured = SELECT last_value from #{table}_id_seq
IF (somehow_captured == 1) THEN
== restore initial unique identifier column value here ==
END
IF (somehow_captured > 1) THEN
TRUNCATE TABLE #{table};
END IF;
END$$;
Manipulating this code in various aspects, I couldn't make it work, because of I am unfamiliar with PostgreSQL functions and blocks (and variables).
Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the "check procedure" units, cleaning procedure should consist of, but haven't accomplished it too.
I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way.
UPDATE:
I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about "Obviously, you need DatabaseCleaner for PG" and so on and so on.
UPDATE 2:
The strategy described here recently was merged into DatabaseCleaner, https://github.com/bmabey/database_cleaner as :pre_count option (see README there).
If someone is interested with the current strategy, I use for this, see this Ruby-based repo https://github.com/stanislaw/truncate-vs-count for both MySQL and PostgreSQL.
My results:
MySQL: the fastest strategy for cleaning databases is truncation with following modifications:
if table is not empty
truncate.
else
if AUTO_INCREMENT is not 0
truncate.
end
end
PostgreSQL: The fastest strategy for cleaning databases is deletion with the same empty-checks as for MySQL, but with relying on currval instead:
if table is not empty
delete table
else
if currval is not 0
delete table
end
end
This is from where it began: https://github.com/bmabey/database_cleaner/issues/126
This is the result code and long discussion: https://github.com/bmabey/database_cleaner/pull/127
This is discussion on pgsql-performance mailing list: http://archives.postgresql.org/pgsql-performance/2012-07/msg00047.php
We began collecting users feedback proving my idea with first checking empty tables is right.