Search code examples
postgresqlreplication

Postgresql redundancy for read-only queries with refresh on demand


I was wondering if there are any postgresql tools which would facilitate replication to read-only server on demand. Here is problem description:

  • Every data changing process is running as multiple transactions (processes are complex - for example, load data, run calculation, run more data, etc.). Until process is finished, I consider database to be inconsistent for reporting.

  • Reports should use only consistent data, but shouldn't wait for the data changing process to complete - they should use old (consistent) data.

Ideally, reports should run on a consistent snapshot of the database. The idea to solve this problem is as follows:

  • have two servers - master and slave. Initially both having exactly same data.
  • all data-changing processes will run on master
  • all reporting tasks will run on slave
  • once data-changing process is completed, refresh of slave is triggered. Trigger would be for example: run some command, create a file on filesystem, etc. Reading partially refreshed slave is not permited.
    • this can't take too long - should be proportional to size of changes on master

Slave will not be used for fail over, this is not the purpose.

Is there any good solution to this problem? Or maybe it can be solved in a different way?


Solution

  • Looks like there is a nice solution to this. Postgres >= 9.2 has possibility to export a snapshot id of current transaction. Every transaction which uses the same snapshot id, will see the same data at the beginning.

    In my case, I can create a "dummy" transaction and export its snapshot it. Dummy transaction will not do anything, it will start when a process is starting and finish when it's finished. All the reports will use this snapshot id to report data which was visible at the time, when dummy transaction was created.

    More about snapshots here: https://www.willglynn.com/2013/10/25/postgresql-snapshot-export/