Search code examples
sqlpostgresqlautomation

Automated way of deleting millions of rows from Postgres tables


Postgres Version: PostgreSQL 10.9 (Ubuntu 10.9-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit

Before I ask my question, I would like to explain why I'm looking into this. we have a history table which has more than 5 million rows and growing every hour.

As the table length grows the select queries are becoming slower, even though we have a proper index. So ideally the first choice for us to delete the old records which are unused.

Approach #1

We tried deleting the records from the table using simple delete from table_name where created_date > certain_date where is_active = false

  • This took a very long time.

Approach #2

Create a script which would delete the rows with the cursor-based approach.

  • This also takes a very long time.

Approach #3

  • Created a new unlogged table.

  • Create an index on the new_table.

  • Copy the contents from the old table to a new table

  • Then set table is logged.

  • Rename the master table as a backup.

  • Issues with this approach, it requires some downtime.

On live productions instances, this would result in missing data / resulting in failures

Approach #4

On further investigation, the performant way to delete unused rows is if we create a table with partition https://www.postgresql.org/docs/10/ddl-partitioning.html - Which we could drop the entire partition immediately.

Questions with the above approach are

  1. How can I create a partition on the existing table?
  2. Will that require downtime?
  3. How can we configure Postgres to create partition automatically, we can't really create partitions manually everyday right?

Any other approaches are also welcome, the thing is I really want this to be automated than manual because I would extend this to multiple tables.

Please let me know your thoughts, which would very helpful


Solution

  • I would go for approach 4, table partitioning.

    1. Create partitions
    2. New data goes directly to the correct partition
    3. Move old data (manually / scripted) to the correct partition
    4. Set a cron job to create partitions for the next X days, if they don't exists already

    No downtime needed