Can i know if kafka consumer can read specific records when from and until offsets of partitions of a topic are known.
Use case is in my spark streaming application few batch are not processed(inserted to table) in this case i want to read only missed data. I am storing the topics details i.e partitions and offsets.
Can someone let me know if this can be achieved reading from topic when offsets are known.
If you want to process set of messages, that is defined by starting and ending offset in spark streaming you can use following code:
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "groupId"
)
val offsetRanges = Array(
OffsetRange("input", 0, 2, 4) // <-- topic name, partition number, fromOffset, untilOffset
)
val sparkContext: SparkContext = ???
val rdd = KafkaUtils.createRDD(sparkContext, kafkaParams.asJava, offsetRanges, PreferConsistent)
// other proccessing and saving
More details regarding integration spark streaming and Kafka can be found: https://spark.apache.org/docs/2.4.0/streaming-kafka-0-10-integration.html