Search code examples
apache-kafkaamazon-dynamodbboto3apache-kafka-streamsamazon-dynamodb-streams

DynamoDB Stream Architecture


I have a DynamoDB table and I want to consume the data from the DynamoDB table and put it in our data store.

I do not want to access the DynamoDB table directly, instead, I want to add a Lambda function that would listen to any change in the table and put it to a DynamoDB stream.

The part where I seek help is:

  1. I can consume directly from DynamoDB stream using python boto client and load it to data store.

  2. I can add kafka/sns in between and consume from kafka.

The only reason I am concerned is shard data would be removed after 24 hours from DynamoDB. In case of failure, how would I be able to tackle it?

What is the best option 1 or 2?


Solution

  • There seems to be a misconception - you don't need a Lambda to put something into a DynamoDB stream - that's what DynamoDB does for you.

    You can choose to set up the "traditional" DynamoDB stream (retention for 24h) or a Kinesis Data Stream (retention up to 7 days). Since you're primarily worried about the 24 hour limitation you may want to use the Kinesis Data Stream and then process the data from there.

    You can even integrate with Lambda from there to write to your data store.