Search code examples
hazelcasthazelcast-imaphazelcast-jet

Does Hazelcast Jet support rolling number as IMap key, with Kafka as a source?


I've used Hazelcast some time ago, and I'm using Hazelcast Jet for the first time and seems interesting for processing some real-time streaming, exploring through.

Here I have a situation, I'm pulling Kafka topic to IMap using:

private static Pipeline buildPipelineForClientDataa() {
        Pipeline p = Pipeline.create();
        p.drawFrom(KafkaSources.kafka(
                props("bootstrap.servers", BOOTSTRAP_SERVERS, 
                        "key.deserializer", StringDeserializer.class.getCanonicalName(), 
                        "value.deserializer", StringDeserializer.class.getCanonicalName(), 
                        "auto.offset.reset", AUTO_OFFSET_RESET), 
                KAFKA_TOPIC))
        .withoutTimestamps()
        .drainTo(Sinks.map(SINK_CLINET_DATA));
        return p;
    }

Well, I've no Key for the topic. Should I've an option to assign rolling number as a key? If so, help me with the technique. Thanks.


Solution

  • Using incrementing number isn't a good match for Jet because it's a distributed system. It works with a partitioned stream and each stream partition should be independent. You would need to route all items through a non-parallel processor.

    You could use UUID or Hazelcast's FlakeIdGenerator as the key, but if the job restarts and re-processes the Kafka topic from a snapshotted offset, the same items will have a different key assigned and will be present twice in the target map.

    If you want to have each item in the map, you can use Kafka's topic+partitionId+offset combination as the key:

    p.drawFrom(KafkaSources.kafka(
        props(...),
        record -> Util.entry(
            Tuple3.tuple3(record.topic(), record.partition(), record.offset()),
            record.value()),
        KAFKA_TOPIC))
    

    You can omit the topic if you have just one topic.