Search code examples
apache-kafkaapache-kafka-connectavro

How could Kafka Connect help persisting Avro to a database and why is this a good design approach?


According to this stack Kafka Connect would help prevent performing a "2-step commit" when publishing to Kafka and persisting the avro message to a database, and that it follows a "read your writes" design approach.

I don't see how this happens. How is that different to extending my KafkaAvroSerializer and extract the "bytearray" containing the avro message and then after publishing it, persist it to the database.

I get that the approach that I suggest creates the issue of not knowing what to prioritise when something goes wrong on either the db or Kafka, BUT I don't see how the Kafka Connect way doesn't face the same exact issue.


Solution

  • Thanks for finding my other answer. The difference is that you're describing two phase commits, which isn't what Kafka Connect does.

    At a high level, Kafka Connect (for sink connectors, such as to a database) is built on top of the Consumer API, and doing exactly what you've described (with KafkaAvroDeserializer, assuming you've setup AvroConverter). But it offers builtin scalability and fault tolerance mechanism, and config driven (no coding required, if not needing any custom plugins).

    Worth pointing out that KafkaAvroDeserializer doesn't return a byte array of the Avro record

    The alternative approach of "read your writes" is called the "Outbox Pattern" w/ CDC; you write to your database, then use Debezium (built on Connect API) to read the database, then write to Kafka. This has the added benefit of consistent database transactions and durability