My team is planning for a DR solution and we need to sync data between Greenplum Databases
in Production and DR sites.
We are running the 6.4 community edition. So tools like gpbackup
and gprestore
are not available.
pg_dump
and pg_restore
not an option because there is large data set involved. What is most suitable solution for our scenario?
gpbackup and gprestore is one way Greenplum users commonly keep two clusters in sync.
While gpbackup and gprestore doesn't ship with open source Greenplum Database, the tools are open source themselves and freely available from their own repository: https://github.com/greenplum-db/gpbackup
Due to Greenplum's distribution of data across segments, there is a requirement the DR cluster contain the same # of primary segments for a successful restore (although the # of segment hosts could differ).
A common approach we see Greenplum users implementing is backing up off cluster to a third party storage system (NFS, s3 compatible storage, etc..) and restoring to the destination/DR cluster from there.
There is an open source gpbackup_s3_plugin available here: https://github.com/greenplum-db/gpbackup-s3-plugin
Let us know if you have any other questions.
oak