Search code examples
scalaapache-kafkaspark-streamingconfluent-platform

is spark aware of new partitions that gets added in kafka?


We recently had an issue where some of the Kafka partitions were lost and job continued without failing. In the meantime, new kafka partitions were added. Looks like our spark streaming job did not get restarted and it was not receiving any data from new partitions, until we noticed the discrepancy in the counts. We re-started the jobs and it was all good. So my question is, is spark-kafka streaming api doesn't check from time to time if new partitions were added? Is there any special setting to enable that?


Solution

  • AFAIK, Spark's Kafka Consumer will not automatically rebalance its consumer group when new topics/partitions are added.

    That's one of the benefits that gets listed when comparing Spark Streaming with Kafka Streams, in that Kafka Streams will rebalance