Search code examples
pythonamazon-web-servicesamazon-dynamodbboto3

Fasted method to create and fill new column in DynamoDB with Python & boto3


I am trying to update multiple DynamoDB tables with a new column that will contain values. The DDB tables contain over 10 million items. I'm unable to use the BatchWriteItem boto3 method as that overwrites the entire item and I need to preserve the existing items.

I've attempted to use the UpdateItem boto3 method but it is very slow for updating this many items.

Questions:

  1. Is there a way to batch together multiple UpdateItem calls instead of having to send millions of calls?
  2. What other methods can I utilize to speed up the process of updating all these items with the new column?

Any help is appreciated, thank you!


Solution

  • There are some ways you can speed things up:

    1. Using BatchExecuteStatement will allow you to do a batch update of up to 25 items in a single request.

    2. Use Parallel Scan to retrieve the keys and multiple threads to parallelize your work

    3. Use AWS Glue to provide distributed compute, this is similar to #2 but you have a lot more processing power by using Spark distribution.