Search code examples
apache-kafkakafka-consumer-apispark-structured-streamingspark-streaming-kafka

How to read from specific Kafka partition in Spark structured streaming


I have three partitions for my Kafka topic and I was wondering if I could read from just one partition out of three. My consumer is spark structured streaming application.

Below is my existing kafka settings in spark.

  val inputDf = spark.readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", brokers)
  .option("subscribe", topic)
  .option("startingOffsets", "latest")
  .load()

Solution

  • Here is how you can read from specific partition.

     val inputDf = spark.readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", brokers)
      .option("assign", """{"topic":[0]}""") 
      .option("startingOffsets", "latest")
      .load()
    

    PS: To read from multiple partitions instead of 1--> """{"topic":[0,1,2..n]}"""