Search code examples
apache-kafka-streams

What is subtopology in Kafka Streams?


I've been exploring the sources of Kafka Streams and have hard time understanding what operators would make for one or many sub-topologies (and then node groups). Can someone explain what to use so Topology.describe shows sub-topologies?

What are sub-topologies?


Solution

  • Seems Apache Kafka docs don't describe them in detail... There is a section in Confluent docs about it: https://docs.confluent.io/platform/current/streams/architecture.html#stream-partitions-and-tasks

    Sub-topologies (also called sub-graphs): [...] A sub-topology is a set of processors, that are all transitively connected as parent/child or via state stores in the topology. Hence, different sub-topologies exchange data via topics and don’t share any state stores. [...]

    A node group is the internal name of a sub-topology.

    Sub-topologies are scaled independently, i.e., each sub-topology may have a different number of instantiations (so-called tasks) depending on the maximum number of partitions over all its input topics.