How can i micro batch events in kafka spout to reduce IO calls in the bolts that follow? The expectation is: emit a batch of maximum size 100 using events in kafka but wait maximum of 1 second to form this batch. If there are not enough events within 1 second, emit the available events.
I can achieve the same in Akka by "source.groupedWithin" method. How do i do the same with kafka spout?
In addition to Chris' answer, you can also use Storm's windowing feature https://storm.apache.org/releases/2.0.0/Windowing.html. You can find an example of this at https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/SlidingWindowTopology.java
You can alternatively use Trident for this if you like. Once you've set up a KafkaTridentSpoutOpaque
, you can use the Kafka client settings to control how many messages are in each batch. You would use the KafkaSpoutConfig
pollTimeoutMs
to set how long you want to wait for a batch to fill, and set the max.poll.records
Kafka client configuration via KafkaSpoutConfig.Builder.setProp
to control the max number of records in a batch.
For a complete example of using the Kafka Trident spout, see https://github.com/apache/storm/blob/master/examples/storm-kafka-client-examples/src/main/java/org/apache/storm/kafka/trident/TridentKafkaClientTopologyNamedTopics.java