Search code examples
apache-kafkatransactions

Transactional Publish to Kafka


Der all,

I am pondering a problem for some time now and while I do have a conceptual solution, I am not sure, there isn't an easier one especially if there is a library already encapsulating such a solution.

Basically, I want to send a message to Kafka iff a database transaction commits. This message must be reliably stored so that no outage (network or power) can prevent this message from being lost. No XA may be used.

Obviously, out of the blue this is hard. If the transaction to the database is committed prior to sending the message, a number of things can go wrong: power to the application-server can be lost, network connections can break down at most-inconvenient moments and so on. The same is true if the message to Kafka is sent prior to committing the database.

My solution-draft looks like this:

  • An additional event table is created in the database
  • Instead of sending a message to Kafka, the event is stored in this table
  • A second process reads from this table and posts messages to Kafka

The second process works in a journal style:

  • Read an unsent message, mark this message as in progress, commit to DB
  • Post message to Kafka
  • if not successful: redo, else:
  • read message from Kafka
  • mark message as sent

The trick here would be to re-read messages from Kafka to recover, if a system outage occurs. All messages, which can not be read from Kafka must be considered lost and then re-sent.

No, I assume, I am not the only one who must not lose a message, so I can't imagine this is something which has to be implemented by hand. Am I right?


Solution

  • After some more consulting and thinking this is some kind of XY-Question. What my question originally asked was "how do I implement synchronous semantics with Kafka" - which you shouldn't.

    Putting some thought into the business-part of the application, we have not yet found a use case which actually requires the exactly-once semantics. In some cases it is acceptable to lose messages, in other cases we can live with phantom-messages.

    If you really have the requirement for exactly-once you will most certainly find, that you have further requirements which scream for syncronous calls like you need an actual confirmation, the action was executed. Do yourself a favour and try not to implement that using a messaging system.