Search code examples
apache-kafkakafka-consumer-apikafka-producer-apiapache-kafka-streamsconfluent-platform

Kafka Streams Threading Model with more than one Stream on the same instance and JVM


Hi I am trying to get a bit more of an understanding on the kafka streams threading model and I am looking at this example in the confluent docs https://docs.confluent.io/current/streams/architecture.html#example

I understand that this example is for a single 'kafka streams app' that, in the first diagram, is deployed on a single machine and allowed to use two threads (configurable). It splits itself across the the two threads leading to 3 separate 'tasks' that, I think, do the same thing as each other they are just parallelized. That much I think I understand.

My question is what if I deploy a second totally different 'kafka streams app' with its own unique client id on that same machine and in the same jvm. Will this second kafka streams app be able to use the same two (share) threads as the first or does the first stream monopolise the threads it is allowed to use.

another way of asking this might be is the minimum number of threads necessary, equal to the number of separate Kafka stream apps running on the machine?


Solution

  • Threads are owned by KafkaStreams instances. Thus, if you create and start multiple KafkaStreams each instance has its own threads -- they are not shared.

    Btw: the number of tasks is independent of the number of KafkaStreams instances and the number of threads. The number of tasks depends on the number of partitions of your input topic as well as the structure of your topology DAG.

    Also, the number of tasks effectively limits the overall parallelism. Each task is executed by exactly one thread. If you have more threads than tasks, some threads will be idle as there is no task that can be assigned to them.

    One more thing: for a parallelism point of view, there is not difference if you start one KafkaStreams instance and configure it with 3 threads, or if you start three KafkaStreams instances with one thread each. All available tasks will be evenly distributed over all available threads.