amazon-web-services backup amazon-dynamodb emr amazon-data-pipeline

Is there a way to group my DynamoDB export tasks on one EMR cluster?

When I set up a re-occuring backup via the export function in the DynamoDB console, the task it creates automatically creates a new EMR cluster when it runs. Some of my tables need to be backed up but are fairly small. What I end up with is a huge number of large servers running to back up some relatively small tables. Is there any easy way to chain these tasks to run on one server group in series or parallel?

Solution

Yes, it is possible. There is not a direct way but needs some additional tweaking in the Data-Pipeline end. You are required to understand how Data-Pipeline actually runs your export job by default.

When you click on export button on DDB console, it takes you to Data-Pipelines console to create a Pipeline for the export.
After filling out the template, instead of running, you can use Edit in Architect feature to alter the current template which only works with one table.
On the architect page, if you observe the Activities section ,you will find EmrAcvity running a EMR STEP using the following param's . This EMR STEP will run the export job using parameters that you initially passed on the template. Note that it will also RunsOn EMRclusterforBackup resource which you can find in resource section.

s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}

To run export on other DDB tables using same EMR resource, you simply need to create another EMRActivity object by clicking Add and then add EMRActivity on architect. On this activity , you can use the same RunsOn as previous activity is using and in the STEP param's you can manually edit to to include other table name and its export path like

s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,s3://myexport-bucket/table2/,table2,0.9

You can extend it for multiple tables.

Note: This can easily be done for multiple tables using a JSON file as Data-Pipeline definition , editing it to add more activities and parameters and then exporting it to Run later.