Search code examples
apache-kafkaapache-kafka-streams

Kafka Streams: How to ensure offset is committed after processing is completed


I want to process messages present in a Kafka topic using Kafka streams.

The last step of the processing is to put the result in a database table. To avoid database contention related issues(the program is going to run 24*7 and process millions of messages), I will be using batching for JDBC calls.

But in this case, there is a possibility of messages getting lost(in a scenario, I read 500 messages from a topic, streams will mark offset, now the program fails. Messages present in JDBC batch update are lost but the offset is marked for those messages).

I want to manually mark the offset of the last message once the database insert/update is complete, but it is not possible according to the following question: How to commit manually with Kafka Stream?.

Can someone please suggest any possible solution


Solution

  • Kafka Stream doesn't support manual commit, and at the same time it doesn't support batch processing as well. With respect to your use case, there are few possibilities:

    1. Use Normal consumer and implement batch processing and control manual offset.

    2. Use Spark Kafka Structured stream as per below Kafka Spark Structured Stream

    3. Try Spring Kafka [Spring Kafka]2

    4. In this kind of scenario there are possibilities to consider JDBC Kafka Connector as well. Kafka JDBC Connector