Search code examples
javaapache-kafkaapache-flinkkafka-producer-api

DataSet to Kafka Using Flink? Is it possible


I have a usecase where i need to move records from hive to kafka. I couldn't find a way where i can directly add a kafka sink to flink dataset. Hence i used a workaround where i call the map transformation on the flink dataset and inside the map function i use the kafkaProducer.send() command for the given record.

The problem i am facing is that i don't have any way to execute kafkaProducer.flush() on every worker node, hence the number of records written in kafka is always slightly lesser than the number of records in the dataset.

Is there an elegant way to handle this? Any way i can add a kafka sink to dataset in flink? Or a way to call kafkaProducer.flush() as a finalizer?


Solution

  • You could simply create a Sink that will use KafkaProducer under the hood and will write data to Kafka.