I am bit confused to understand the difference between num.standby.replicas and max.warmup.replicas. Both sound same to me as both are helping to reduce the time taken in getting a standby task and it's state store ready to be promoted as active while a consumer group rebalancing is happening. Thanks in advance.
num.standbys
is a per-task setting applies for HA; if you lose a task, Kafka Streams migrates the standby task to the active task immediately.
For max.warmup.replicas
is a "global" setting, and it only applies to the case where you are scaling out, adding a Kafka Streams instance with the same application-id.
In the scale-out scenario, with a max.warmup.replicas=1
, Kafka Streams would "warm up" a single task A by starting A' on the new node, and when A' is up to the acceptable lag setting, task A will migrate to the new node (A' -> A) then the process will repeat for another task if you set max.warmup.replicas=2
, then Kafka Streams will warmup two tasks A and B, etc.