Search code examples
amazon-kinesisamazon-dynamodb-streamsamazon-kcl

How does LATEST position in stream works in Kinesis, KCL?


We are building a service based on Kinesis / DynamoDB streams and we have the following question about the behavior of the checkpoints.

We have a worker that starts with the following configuration withInitialPositionInStream (InitialPositionInStream.LATEST) and the name of the KCL application is always the same.

What we have observed by turning the worker off and on again is that it does not start to consume from the end of the stream, since we have a lag metric and we see that when the worker is turned on the consumption lag is hours, when we expect it to be less of 1 second since they are messages that we produce at the moment.

  • Is this an expected behavior?
  • Are we misinterpreting how the LATEST works?

Thank you very much.


Solution

  • As the documentation for InitialPositionInStream states,

    Used to specify the position in the stream where a new application should start from. This is used during initial application bootstrap (when a checkpoint doesn't exist for a shard or its parents).

    So, it's used only during initial new application bootstrap and in case of LATEST, it starts after the most recent data record. But only when a checkpoint doesn't exist for a shard or its parents.

    So, if you turn your worker off and then turn it on again, it's not expected to start from LATEST anymore but instead it starts from the last checkpointed sequence number for a shard.

    KCL does not checkpoint automatically and thus if your worker starts with an hours lag means that probably you checkpoint too rare.