Search code examples
apache-kafkaapache-kafka-streams

How do KTables interact with Kafka on startup?


I'm a little confused on how this works conceptually.

How does kafka streams guarantee that the partitions assigned to it from the kafka broker match the partitions assigned for other topics? Seems like there needs to be some coordination happening? Also, does kafka streams always read the compacted topic from the beginning, or does it read from latest offset? Once it reads the messages from the compacted topic, does it commit the offset?


Solution

  • How does kafka streams guarantee that the partitions assigned to it from the kafka broker match the partitions assigned for other topics?

    Kafka streams application subscribes to one or more topics under an application.id which resembles group.id in Kafka-clients.

    When a client requests Kafka broker for subscription of a topic with a particular group.id it returns a set of partitions for that topic. If all the topics partitions are assigned to any streams instance under the same application.id, a re-balance will be triggered and the newly started streams instance will receive its share of the partitions and the old instance will no longer be listening to those partitions.

    Does kafka streams always read the compacted topic from the beginning, or does it read from latest offset?

    Whether compacted or otherwise, Kafka streams applications reads from the last committed offset.

    Once it reads the messages from the compacted topic, does it commit the offset?

    From the wiki, it is stated that..

    Kafka Streams commit the current processing progress in regular intervals (parameter commit.interval.ms). If a commit is triggered, all state stores need to flush data to disk, i.e., all internal topics needs to get flushed to Kafka. Furthermore, all user topics get flushed, too. Finally, all current topic offsets are committed to Kafka. In case of failure and restart, the application can resume processing from its last commit point (providing at-least-once processing guarantees).

    While writing a Kafka streams application, the developer need not manually take care of committing the offsets because it is internally done by Kafka streams.