Search code examples
amazon-web-servicesamazon-dynamodbaws-glue

How to see progress when using Glue to export DynamoDB table


I'm trying to export every item in a DynamoDB table to S3. I found this tutorial https://aws.amazon.com/blogs/big-data/how-to-export-an-amazon-dynamodb-table-to-amazon-s3-using-aws-step-functions-and-aws-glue/ and followed the example. Basically,

table = glueContext.create_dynamic_frame.from_options(
  "dynamodb",
  connection_options={
    "dynamodb.input.tableName": table_name,
    "dynamodb.throughput.read.percent": read_percentage,
    "dynamodb.splits": splits
  }
)

glueContext.write_dynamic_frame.from_options(
  frame=table,
  connection_type="s3",
  connection_options={
    "path": output_path
  },
  format=output_format,
  transformation_ctx="datasink"
)

I tested it in a tiny table in nonprod environment and it works fine. But my Dynamo table in production is over 400GB, 200 mil items. I suppose it'll take a while, but I have no idea how long to expect. Hours, or even days? Are there any way to show progress? For example, showing a count of how many items have been processed. I don't want to blindly start this job and wait.


Solution

  • One way would be to enable continuous logging for your AWS Glue Job to monitor its progress.

    Another way would be to trigger a Lambda function whenever a file has been stored in S3, using Amazon S3 event notifications.