Search code examples
postgresqlamazon-web-servicesamazon-s3database-migration

Migrate / Archive inactive data from Postgres to S3


We have a Postgres table with tenants and multiple foreign key tables with tenant_id. We would like to archive all the data to an s3 bucket and deleted from the Postgres table whenever a tenant is set to inactive (is_active = false) in order to save on storage costs.

I'm wondering what is the best way to do this? I was looking at Database Migration Services but it seems like that would replicate all of the data not just the inactive data. I would also need to just delete the inactive data afterwards not the entire table. I also thought to identify all the tables that need to migrate data and run a script to save the data to an s3 bucket and then delete the data from the source tables if I was able to successfully replicate the data into S3.

Are there any other methods to accomplish this easily? There are quite a bit of tables connected to a tenant and I want to make sure there isn't an easier way to migrate the inactive data than writing a long script.


Solution

  • You have two choices:

    • Create a Snapshot of the database, which contains a complete copy of the database. Then, delete the data from the database. If the data is needed in future, you can Restore the Snapshot to a new database and recover the data. You will be charged for storing the snapshot. OR
    • Export the data, then delete the data. It won't be easy to 'restore' the data into the database in future, but at least you will have a copy of it in S3.

    You can Export data from an RDS for PostgreSQL DB instance to Amazon S3 - Amazon Relational Database Service by using the aws_s3 extension. It takes an SQL statement (eg select * from table where tenant_id = 12) and a bucket and key for where to store the data. It is saved in CSV format (but options can modify things like delimiters).

    So, just run the export command for all tables and rows that you want to keep.