Search code examples
lambdaamazon-dynamodbamazon-dynamodb-streams

How to use DynamoDB Stream and lambda functions to sync multiple tables


I have a very common use case that need sync 2 dynamoDB tables. The logic flow would be the following.

  1. A Job is created in Job Table
  2. Multiple Requests are created in Request Table. All the requests are created from one Job, like many to one relationship.
  3. Requests are processed by some other workers.
  4. Each request will marked as done independently in Request Table.
  5. When all requests for a certain job is finished, mark a job is done in Job table.

Right now, My thinking is to enable stream on Requests table. When ever a request is finished, it will trigger a lambda function to check if all the requests are finished.

I have read a lot of documents. And find many limitations of this approach:

  1. It seems stream+lambda will guarantee that each stream shard will trigger a lambda function at least once, however not only once. Thus the lambda function has to be idempotent. (Let the lambda function to increase the number of finished requests will not work here).

So, I think I have to scan the requests table each time when a lambda function is triggered. Will this approach have to much overhead?

  1. DynamoDB stream tends to dispatch every event into different shards. Each shard whenever is filled up will trigger a lambda function. I am not sure what if a shard is half filled for a long time(no events to the table). Will it still trigger the lambda function somehow?

I am also open to all other solutions that could solve this problem. I am not sure if I follow the best practice here.


Solution

  • I think you can solve it by using another DynamoDB server.

    You can create a separate table in DynamoDB:

    FinishedTasks JobId - partition key - id of a job FinishedRequestId - sort key - id of a finished request

    Every lambda job would do the following:

    1. Read new item from a stream
    2. Write new item to the FinishedTasks
    3. Read all finished tasks for a job id
    4. Check if all tasks are finished
    5. If all tasks are finished, do what is necessary

    In this case you have an idempotent task (it does not matter if you override an item in the FinishedTasks twice)

    Of course you need to remove old items from FinishedTasks. To can use the TTL feature to automatically remove old items.