Search code examples
amazon-s3aws-lambdaamazon-dynamodb

Efficient solution to load million records in dynamo Db


Am looking for suggestions from experts to solution a use case where we do data migration to dynamo db . The number we are expecting here is 10 to 15 million records . All these records will be coming in chunks over multiple files to s3 and am expecting it to be 30plus files. When file arrived in s3 , it triggers lambda which gonna process the records and push it to DynamoDB .

The lambda function is all set to process records in each file . But the major concern I have here is

  1. What if 30 to 40 files comes one after another , how can I ensure , there is enough lambda instance to receive the trigger and process the files . Should I be setting enough RC ?

  2. What will happen if I set RC to 20 and files that arrived are more than 20 ?

  3. What is best practice to make sure all files are processed, and processing parallely would be best option as the numbers we expect is huge .

Looking for your suggestions.


Solution

    1. Reserved Capacity can help you to allocate resources to your function faster, however all files will be eventually processed regardless.
    2. Each account has a default soft concurrency limit of 1000. If you reserve 20 for that function then no other function can use that capacity and 20 will always be allocated to this. If your function exceeds 20, it just consumes from your account limit, which is shared with all other lambdas in your account.
    3. Reserved concurrency will be great, but also making sure your file sizes are small enough so that Lambda duration isn't large. You may also consider pre-warming your DynamoDB table to ensure it handles the large load. Even distribution of inserts is also must.