Search code examples
amazon-web-servicesaws-glue

How to copy data from Amazon S3 to DDB using AWS Glue


I am following AWS documentation on how to transfer DDB table from one account to another. There are two steps:

  1. Export DDB table into Amazon S3
  2. Use a Glue job to read the files from the Amazon S3 bucket and write them to the target DynamoDB table

I was able to do the first step. Unfortunately the instructions don't say how to do the second step. I have worked with Glue a couple of times, but the console UI is very user un-friendly and I have no idea how to achieve it.

Can somebody please explain how to import the data from S3 into the DDB?


Solution

  • I just used AWS Glue for this purpose. You will need to create a new IAM role for your Glue service, it should have access to S3 and DynamoDB.

    1. Export your source dynamo table data to an S3 bucket
    2. Go to AWS Glue and create a new job.
    3. Set the source as your S3 bucket and set the path to the data directory (e.g. s3:/your-bucket-name/AWSDynamoDB/01234562352725-cb24aab6/data/)
    4. Create a custom transform script with the following structure:
    def MyTransform(glueContext, dfc) -> DynamicFrameCollection:
        S3bucket_node = dfc["AmazonS3_node123456789"]
        
        ApplyMapping_node2 = ApplyMapping.apply( 
            frame=S3bucket_node, 
            mappings=[ 
                ("Item.digest.S", "string", "digest", "string"),
                ("Item.locale.S", "string", "locale", "string"),
                ("Item.value.S", "string", "value", "string"),
                ("Item.translation.S", "string", "translation", "string"),
                ("Item.created_at.S", "string", "created_at", "string"),
            ], 
              transformation_ctx="ApplyMapping_node2"  
            )
            
                
        S3bucket_node3 = glueContext.write_dynamic_frame.from_options( 
            frame=ApplyMapping_node2, 
            connection_type="dynamodb", 
            connection_options={"dynamodb.output.tableName": "<destination-dynamo-tablename>"}
           )
        
        return DynamicFrameCollection({"S3bucket_node3": S3bucket_node3}, glueContext)
    

    Replace the "AmazonS3_node123456789" with your S3 node ID (you can find your ID in the script) and the mappings list with your table fields. Also, don't forget to replace "destination-dynamo-tablename" with your dynamo table name.