Search code examples
postgresqltransactionscdcdebezium

Debezium CDC on PostgreSQL and transactions


I'd like to use Debezium for Change Data Capture(CDC) on PostgreSQL database. Right now I don't fully understand how to deal with transactions in Debezium.

Let's consider the following example - I have 2 tables in source PostgreSQL database. I'm inserting data into both of these table in scope of single database transaction. In other words I have to execute two separate INSERT statement. If I understand correctly in case of Debezium CDC it will result into two separate messages into two separate Kafka topics(or AWS Kinesis streams). Each of these messages will be consumed by own consumer and inserted into the target database in separate transaction. In case one of the transaction fail, the other can be successful and I'll run into the data inconsistency state in the target database.

Is there any standard mechanisms in Debezium to handle such situation? Or for example, in order to avoid it, I have to use Kafka Streams API and join these two topics into a single one(on transactionId) before updating the target db?


Solution

  • You describe the default behaviour correctly.

    If you'd like to write multiple records into a sink database using one transaction, you'll likely have to use a bespoke consumer application which buffers the events originating from one transaction internally and writes them out to the sink database at once in a single transaction. Simply pulling events together on one topic won't help you, as a generic sink connector still wouldn't know about the source transactional boundaries.

    You can use Debezium's transaction metadata topic for that. It provides all the info such consumer app would need for implementing such buffering logic. Unfortunately, we don't have a comprehensive demo for this yet, but I hope we'll get to blog about this very soon.