Search code examples
amazon-web-servicesaws-lambdastreamduplicatesamazon-kinesis

Too many GetRecords count compared to PutRecords in Kinesis Stream resulting duplicate records


I have a Kinesis Stream that has three shards and the Producer applications write to the stream with a random partition key. In Kinesis Metric, I found that the count of GetRecords.Records is almost double to that of PutRecords.Records count.

To be precise count (in a minute):

PutRecords.Records: 10749

GetRecords.Records: 21496

Because of this, the Kinesis Record Processor which is lambda in my case, is getting a lot of duplicate records.

AWS document does suggest there would be some duplicates by “at least once” delivery mechanism of Kinesis stream. But almost the double of GetRecords is not very appropriate. Here is my SAM template for the processor lambda.

  MyStream:
  Type: AWS::Kinesis::Stream
  Properties:
    Name: my-stream
    ShardCount: 3

  MyStreamProcessorLambda:
   Type: AWS::Serverless::Function
   Properties:
    FunctionName: my-stream-processor-lambda
   ...
   ...
   Events:
    Stream:
      Type: Kinesis
      Properties:
        Stream: !Ref MyStream
        BatchSize: 10000
        StartingPosition: LATEST
        MaximumBatchingWindowInSeconds: 300

The processor lambda is now, idempotent by enforcing a DB constraint such that same Record from the Batch will never be inserted in DB. Also the Error Rate of Lambda is 0. (No errors/ retries by the lambda)

Also the GetRecords.IteratorAgeMilliseconds in Stream Metrics is 0 and Read/Write ThroughputExceeded is 0.

Thus, I would like to get rid of those Duplicate records in the Batch, the lambda is processing.

Any configuration that have gone wrong here ?


Solution

  • It seems in Kinesis enhanced fan-out consumer, the SubscribeToShard also get all Records from GetRecords call. This is the reason the GetRecords count is almost twice as that of PutRecords.