Search code examples
greenplumhawq

Greenplum DCA-How to backup & restore Version V2 to V3


We have small array of greenplum DCA V1 and V3. Trying to conduct backup/restore process steps between them.

As novice to DCA Appliances.banging my head against the wall to understand the parallel backup process in logical way.

We tried Trying to conduct parallel backup. using gpcrondump/gpdbrestore. But did not understand working process how it execute

on Master host  
on segment host 

Question is : How parallel backup works in master-segment DCA env from version to version.


Solution

  • gpcrondump executes a backup in parallel. It basically coordinates the backups across all segments. By default, each segment will create a db_dumps directory in each segment's $PGDATA directory and a sub-directory under that with a date format.

    For example, let's say you have 4 segments per host and hosts sdw1-4. The dumps will be created in:

    /data1/gpseg0/db_dumps/20161111/
    /data1/gpseg1/db_dumps/20161111/
    /data2/gpseg2/db_dumps/20161111/
    /data2/gpseg3/db_dumps/20161111/
    

    This repeats across all segments.

    The segment will dump only its data to this dump location. grcrondump will name the files, make sure it completes successfully, etc as each segment dumps data independently of the other segments. Thus, it is done in parallel.

    The master will also have a backup directory created but there isn't much data in this location. It is mainly metadata about the backup that was executed.

    The metadata for each backup is pretty important. It contains the segment id and the content id for the backup.

    gpdbrestore restores a backup created by gpcrondump. It reads the files and loads it into the database. It reads those backup files and makes sure the segment id and content id match the target. So, the number of segments from a backup must match the number of segments to restore to. It also has to have the same mapping of segment id to content id.

    Migration from one cluster can be done multiple ways. One way is to do a backup and then restore. This requires the same configuration in both clusters. You have to copy all of the backup files from one cluster to the other as well. Alternatively, you could backup and restore from a backup device like DataDomain.

    You can also use a built-in tool call gptransfer. This doesn't use a backup but instead, uses external tables to transfer from one cluster to another. The configuration of the two clusters doesn't have to be the same when using this tool but if you are going from a larger cluster to a smaller cluster, it will not be done in parallel.

    I highly recommend you reach out to your Pivotal Account Rep to get some assistance. More than likely, you have already paid for services when buying the new DCA that will cover part or all of the migration work. You will have to configure networking between the two clusters which requires some help from EMC too.

    Good luck!!