Given this answer:
the maximum number of partitions over all topics determines the number of tasks.
and the code of AbstractTask with the following line:
final Set<TopicPartition> inputPartitions
I wonder when (if ever) a task can have multiple partitions assigned?
A task would have multiple input partitions if you do a join()
, merge()
, or copartition()
for example. Also if you read multiple topics at once via pattern subscription.
It's orthogonal to
the maximum number of partitions over all topics determines the number of tasks
The quote is about the number of created tasks and has nothing to do with the number of partitions per task.
Assume you have two input topic A
with 2 partitions (A-0
and A-1
) and B
with only one partition (B-0
). Your program is:
KStream a = builder.stream("A",...);
KStream b = builder.stream("B",...);
a.merge(b);
Your program is logically:
topic-A ---+
+---> merge()
topic-B ---+
For this case, you would get two tasks:
Task 0_0:
A-0 ---+
+--- merge() -->
B-0 ---+
Task 0_1:
A-1 ---+
+--- merge() -->
Note that the second task 0_1
only has one input partition, because topic B has only 1 partition.
A task is basically a copy (physical instantiation) of your (logical) program that processed all partitions with the name partition number. Because topic-A has two partitions, two tasks need to be created.