Search code examples
postgresqlgoogle-cloud-functionsdigital-ocean

How do I schedule backups of an external PostgreSQL using Google Cloud Serverless?


We have a PostgreSQL cluster managed by DigitalOcean that we want to back up to Google Cloud Storage on a schedule. I'm planning on creating a Cloud Function that runs on a schedule.

Because our databases aren't stored on Google Cloud, I can't use this guide that Google has published. And since pg_dump has to be installed on the system to be available, that isn't an option in Cloud Functions either.

What options do I have? Is it possible to use something like psycopg2 in Python or pg in Node.js?


Solution

  • since pg_dump has to be installed on the system to be available, that isn't an option in Cloud Functions either. What options do I have?

    1. Cloud Run. While Cloud Functions are supposed to abstract the underlying platform away, Cloud Run lets you define both the function and the platform it'll wake up on. Both services make up what Google advertises as Cloud Serverless.
    2. Cloud Compute, a tiny VM with a simple script on it, scheduled to wake up, run it, then go back to sleep. You can save that as a custom image and configure multiple small backup workers like that.

    The latter would kind of emulate what Cloud Run is supposed to handle for you, but it might be more intuitive (that's pretty much how you'd do it at home or on-prem). Might also be a bit more cost-effective, since more "cloud-native" solutions like Cloud Run, typically come at a premium - similar to how a managed database usually costs a bit more than a VM rented from the same provider, hosting the exact same db, with the exact same resources (except provider's secret sauce they justify the premium with).

    Still, a job that involves just waking up and piping pg_dump|gsutil cp for a few minutes should cost you little to nothing regardless of which one you pick. You might want to look into multithreaded pg_dump -j to speed things up at the cost of some more CPU, and gsutil -m for multithreaded upload.