apache-kafka apache-flink kafka-producer-api

Flink kafka - Flink job not sending messages to different partitions

I have the below configuration:

One kafka topic with 2 partitions
One zookeeper instance
One kafka instance
Two consumers with same group id

Flink job snippet:

speStream.addSink(new FlinkKafkaProducer011(kafkaTopicName,new 
SimpleStringSchema(), props));

Scenario 1:

I have written a flink job (Producer) on eclipse which is reading a file from a folder and putting the msgs on kafka topic.

So when i run this code using eclipse, it works fine.

For example : If I place a file with 100 records, flink sends few msgs to partition 1 & few msgs to partition 2 and hence both the consumers gets few msgs.

Scenario 2: When i create the jar of the above code and run it on flink server, flink sends all the msgs to a single partition and hence only one consumer get all the msgs.

I want the scenario 1 using the jar created in scenario 2.

Solution

If you do not provide a FlinkKafkaPartitioner or do not explicitly say to use Kafka's one a FlinkFixedPartitioner will be used, meaning that all events from one task will end up in the same partition.

To use Kafka's partitioner use this ctor:

speStream.addSink(new FlinkKafkaProducer011(kafkaTopicName,new SimpleStringSchema(), props), Optional.empty());

The difference between running from IDE and eclipse are probably because of different setup for parallelism or partitioning within Flink.