I'm trying to optimize the backend for an information system for high-availability, which involves splitting off a part needed for time-critical client requests (front office) from the rest (back office).
Front office will have redundant application servers with load balancing for maximum performance and will use a database with pre-computed data. Back office will periodically prepare data for the front office based on client statistics and some external data.
A part of the data schema will be shared between both back and front office, but not the whole databases, only parts of some tables. The data will not need to correspond all the time, it will be synchronized between the two databases periodically. Continuous synchronization is also viable, but there is no real-time consistency requierement and it seems that batch-style synchronization would be better in terms of control, debug and backup possibilities. I expect no need for solving conflicts because data will mostly grow and change only on one side.
The solution should allow defining corresponding tables and columns and then it will insert/update new/changed rows. The solution should ideally use data model defined in Groovy classes (probably through annotations?), as both applications run on Grails. The synchronization may use the existing Grails web applications or run externally, maybe even on the database server alone (Postgresql).
There are systems for replicating whole mirrored databases, but I wasn't able to find any solution suiting my needs. Do you know of any existing framework to do help with that or to make my own is the only possibility?
I ended up using Londiste from SkyTools. The project page on pgFoundry site lists quite old binaries (and is currently down), so you better build it from source.
It's one direction (master-slave) only, so one has to set up two synchronization instances for bidirectional sync. Note that each instance consists of two Londiste binaries (master and slave worker) and a ticker daemon that pushes the changes.
To reduce synchronization traffic, you can extend the polling period (by default 1 second) in the configuration file or even turn it off completely by stopping the ticker and then trigger the sync manually by running SQL function pgq.ticker
on master.
I solved the issue of partial column replication by writing a simple custom handler (londiste.handler.TableHandler
subclass) with column-mapping configured in database. The mapping configuration is not model-driven (yet) as I originally planned, but I only need to replicate common columns, so this solution is sufficient for now.