Search code examples
apache-kafka

Will Kafka continue assigning sequential offsets after retention policy deletes old messages?


I’m using Kafka with a topic that has 4 partitions. The retention period (TTL) for messages in Kafka is set to the default of 7 days. I’m running a non-streaming batch job that processes data from Kafka, and I manually store the Kafka offsets after each processing run.

Here’s an example of the saved offsets after a few days of processing:

Day 1 (Offsets saved):

{
  "0": 100,
  "1": 110,
  "2": 90,
  "3": 123
}

Day 6 (Offsets saved):

{
  "0": 20000,
  "1": 21000,
  "2": 11000,
  "3": 17003
}

By Day 7, Kafka’s retention policy will kick in, and all messages older than 7 days will be automatically deleted.

My Concern:

When new data is produced to Kafka after Day 7, and the old messages have been deleted, I’m wondering what happens with the offsets.

  • Will Kafka continue assigning offsets sequentially, meaning the next new message will have offset 20001 for partition 0?
  • Or will Kafka reset the offsets for each partition back to 0 once the old messages are deleted?

The last processed offset I have stored is around 20000, and I want to make sure that starting to read from offset 20001 the next day will allow me to correctly read the newly produced messages, without encountering any issues (like Kafka reusing old offsets).


Solution

  • Kafka is not reusing earlier offsets that don't map to records anymore.

    New records are always assigned the next offset sequentially. So if your current last record on a partition is at offset 20000, the next record on that partition will be offset 20001 regardless of retention, or log compaction.