Search code examples
postgresqlherokuheroku-postgres

What's the difference between Heroku's Postgres Continuous Protection vs including a Follower database for integrity and recovery


I'm considering deploying an app to Heroku along with a Postres Standard database plan. I'm keen on ensuring data integrity and ensuring in no case that my customer's data can be lost if the database becomes corrupted or some other similar issue. I also want to ensure a smooth recovery process in tis even. So I have the following questions:

  1. First, I'm assuming with Continuos there's a still a possibility that a database can become corrupted. Is this true?

  2. What's provides more integrity, protection, and ease of recovery if a database becomes corrupted: Standard DB / with Continuos Protection or Standard DB with a Follower DB.

  3. If by chance the DB becomes corrupted, or an database integrity issues arise, how will Heroku remediate (given the database is a "managed" service). Is it automated or I have to work with Support manually to remediate?

I would love to hear your thoughts on this. My experience in the past has been with MySQL but not Postgres, which I hear great things about.

Thanks


Solution

  • Caveat: I have some experience with Postgresql, but I don't have any experience with Heroku as such.

    What Heroku calls 'Continuous Protection' and 'follower' databases are implemented using Postgresql's Continuous Archiving and streaming replication functionality. They have provided a range of administrative tools and infrastructure around these functions to make them easier to use.

    Both of these functions make use of the fact that Postgresql writes all updates that it is making to databases in a Write-Ahead Log (WAL).

    With Continuous Archiving, one takes a complete copy of all of the underlying files in the database - this is referred to as the base backup. One also collects all WAL files produced by the database, both during and after production of the base backup. Note that you do not need to stop the database in order to make the base backup - it is a fairly unobtrusive process.

    If the worst happens, and it is necessary to recover the database from the backup, you just restore the base dump, configure the database so it knows where to find the archived WAL files, and start it up. It will then replay the WAL files in sequence until it is fully up to date.

    Note that you can also stop the replay early. This can be extremely useful, as you will see in my answer to your first question:

    1. First, I'm assuming with Continuos there's a still a possibility that a database can become corrupted. Is this true?

    Yes, of course. Database corruption can happen for a number of reasons: hardware failure, a software fault in the database, a fault in your application, or even operator error.

    One of the benefits of continuous archiving, though, is that you can replay the WAL files up to a particular point in time, so you can effectively rewind back to the point immediately before the database became corrupted.

    As mentioned above, a Follower DB uses Postgresql's 'Streaming Replication' function. With this function, you restore your base backup onto another server, configure it to connect to the master database and fetch WAL files in real time as they are produced. The follower then keeps up to date with any changes made on the master.

    1. Whats provides more integrity, protection, and ease of recovery if a database becomes corrupted: Standard DB / with Continuos Protection or Standard DB with a Follower DB.

    Ease of recovery is the difference.

    If you have a Follower DB, it is a hot standby - if the master fails for some reason, you can switch your application over to the follower with minimal downtime. On the other hand, if you have a large database and you have to restore it from the last base backup and then replay all the WAL files produced since - well that could take a long time, days even if it was a really large database.

    Note that also, however, that a follower DB will be of no use if your database becomes corrupted due to, for example, an administrator accidentally dropping the wrong table. The table will be dropped in the follower only a few seconds later. They are like lemmings going over a cliff. The same applies if your application corrupts the database due to a bug, or a hacker, or whatever. Even with the follower, you must have a proper backup in place, either a Continuous Archive or a normal pg_dump.

    1. If by chance the DB becomes corrupted, or an database integrity issues arise, how will Heroku remediate (given the database is a "managed" service). Is it automated or I have to work with Support manually to remediate?

    Their documentation indicates that premium plans do feature automated failover. This would be useful in the event of a hardware or platform failure and most kinds of database failure, where the system can detect that the master database has gone down and initiate a failover.

    In the case where the database becomes corrupted by the application itself (or a hasty admin) then I suspect you would have to manually initiate failover.