Search code examples
amazon-web-servicesamazon-s3lambdaarchitectureamazon-kinesis

AWS Kinesis and Lambda data versioning


I have created an AWS Firehose endpoint (might change to simple Kinesis) that receives logs from producers and saves them to a S3 bucket and a lambda function that consumes data, processes it and saves output to db.

Everything works fine. Now i am planning about creating a staging and development flow for this entire structure. When i release a new version i am not capable of replacing entire producers instantly, therefore i need to keep older production versions until no producer left - because i might make breaking protocol changes on new versions.

I am not sure what would be the best approach to create a versionable system using kinesis and lambda. Should i copy entire structure for new versions (including dev and staging) and make producers write to specific versioned stream?

or should i create a mid lambda function that inspects packets (which contain their version info) and outputs events to specific s3 which has versioned folders? So that lambda functions would consume only data that they know about. This will let me use versioning support for lambda functions.

Here is an structure image for first idea

Seperate flows for each version

Here is second structure

Single common flow for all versions

I wonder which will be a better solution or are there better ways to accomplish this


Solution

  • First, Lambdas can be triggered directly using Kinesis- no need for Kinesis Firehose or S3.

    Second, your question really boils down to: do you need separate Kinesis+Lambda pipeline per version or not. I'd go with the following solution:

    • One Kinesis stream for all versions of data.
    • One Lambda function on this stream. It internally handles different versions separately. Crudely speaking, think of various if-else checks on version number.

    The advantages of above approach vs one Kinesis+Lambda pipeline per version:

    • The former is operationally simpler. In the latter, you'll need to setup a new pipeline every time a new version is introduced.
    • At any point of time, you'd have a small number of active versions. So, a few if-else checks in code should work just fine.

    Of course, keep Dev and Prod pipelines separate, so as to minimize blast radius of bad code code in former.