Search code examples
greenplum

How to sync data between two Greeplum Clusters in remote data centers (DR)


My team is planning for a DR solution and we need to sync data between Greenplum Databases in Production and DR sites.

We are running the 6.4 community edition. So tools like gpbackup and gprestore are not available. pg_dump and pg_restore not an option because there is large data set involved. What is most suitable solution for our scenario?


Solution

  • gpbackup and gprestore is one way Greenplum users commonly keep two clusters in sync.

    While gpbackup and gprestore doesn't ship with open source Greenplum Database, the tools are open source themselves and freely available from their own repository: https://github.com/greenplum-db/gpbackup

    Due to Greenplum's distribution of data across segments, there is a requirement the DR cluster contain the same # of primary segments for a successful restore (although the # of segment hosts could differ).

    A common approach we see Greenplum users implementing is backing up off cluster to a third party storage system (NFS, s3 compatible storage, etc..) and restoring to the destination/DR cluster from there.

    There is an open source gpbackup_s3_plugin available here: https://github.com/greenplum-db/gpbackup-s3-plugin

    Let us know if you have any other questions.

    oak