Search code examples
postgresqlamazon-rdsparquet

How to load an RDS snapshot (in parquet format) into a local PostgreSQL


There are a million tutorials on how to import PostgreSQL data into RDS, and how to export RDS database snapshots to S3, and how to convert from PostgreSQL to Parquet, but I can't find a single article or SO question about how to properly go the other way: I need to load a database snapshot that RDS exported to S3 (as a series of directories and parquet files) into a local PostgreSQL instance.

I exported the data just fine, and synced it to my local system using the AWS CLI:

aws s3 cp s3://snapshotbucketname .

But what comes next, now that I have the actual snapshot living on my drive? What's the correct way to turn this from "not a psql database at all" into a psql database that I can work with locally?


Solution

  • As far as I can tell this is straight up not possible using the AWS console and CLI tool. Instead you need to set up your database in RDS to be in a VPC that you can access, so that you can then use pg_dump to download your database, after which you can use pg_restore locally.

    For prod databases, they're in a VPC that's heavily restricted (if configured right) and the odds that you directly connect to it are near zero. In those cases, take a snapshot of the database ("maintenance" tab in the RDS console), then select that snapshop, and restore it to a new database, in a VPC and part of a security group that you do have access to. Once that's done, dump it, deleted the new database, and then delete the snapshot so you don't have a phantom db hanging around costing you money.