I have a Kinesis Stream that has three shards and the Producer applications write to the stream with a random partition key. In Kinesis Metric, I found that the count of GetRecords.Records is almost double to that of PutRecords.Records count.
To be precise count (in a minute):
PutRecords.Records: 10749
GetRecords.Records: 21496
Because of this, the Kinesis Record Processor which is lambda in my case, is getting a lot of duplicate records.
AWS document does suggest there would be some duplicates by “at least once” delivery mechanism of Kinesis stream. But almost the double of GetRecords is not very appropriate. Here is my SAM template for the processor lambda.
MyStream:
Type: AWS::Kinesis::Stream
Properties:
Name: my-stream
ShardCount: 3
MyStreamProcessorLambda:
Type: AWS::Serverless::Function
Properties:
FunctionName: my-stream-processor-lambda
...
...
Events:
Stream:
Type: Kinesis
Properties:
Stream: !Ref MyStream
BatchSize: 10000
StartingPosition: LATEST
MaximumBatchingWindowInSeconds: 300
The processor lambda is now, idempotent by enforcing a DB constraint such that same Record from the Batch will never be inserted in DB. Also the Error Rate of Lambda is 0. (No errors/ retries by the lambda)
Also the GetRecords.IteratorAgeMilliseconds in Stream Metrics is 0 and Read/Write ThroughputExceeded is 0.
Thus, I would like to get rid of those Duplicate records in the Batch, the lambda is processing.
Any configuration that have gone wrong here ?
It seems in Kinesis enhanced fan-out consumer, the SubscribeToShard also get all Records from GetRecords call. This is the reason the GetRecords count is almost twice as that of PutRecords.