I've used Hazelcast some time ago, and I'm using Hazelcast Jet for the first time and seems interesting for processing some real-time streaming, exploring through.
Here I have a situation, I'm pulling Kafka topic
to IMap
using:
private static Pipeline buildPipelineForClientDataa() {
Pipeline p = Pipeline.create();
p.drawFrom(KafkaSources.kafka(
props("bootstrap.servers", BOOTSTRAP_SERVERS,
"key.deserializer", StringDeserializer.class.getCanonicalName(),
"value.deserializer", StringDeserializer.class.getCanonicalName(),
"auto.offset.reset", AUTO_OFFSET_RESET),
KAFKA_TOPIC))
.withoutTimestamps()
.drainTo(Sinks.map(SINK_CLINET_DATA));
return p;
}
Well, I've no Key for the topic. Should I've an option to assign rolling number as a key? If so, help me with the technique. Thanks.
Using incrementing number isn't a good match for Jet because it's a distributed system. It works with a partitioned stream and each stream partition should be independent. You would need to route all items through a non-parallel processor.
You could use UUID
or Hazelcast's FlakeIdGenerator
as the key, but if the job restarts and re-processes the Kafka topic from a snapshotted offset, the same items will have a different key assigned and will be present twice in the target map.
If you want to have each item in the map, you can use Kafka's topic+partitionId+offset combination as the key:
p.drawFrom(KafkaSources.kafka(
props(...),
record -> Util.entry(
Tuple3.tuple3(record.topic(), record.partition(), record.offset()),
record.value()),
KAFKA_TOPIC))
You can omit the topic if you have just one topic.