Search code examples
mysqlapache-kafkaapache-kafka-connectdebezium

Usability of Binary Log in data streaming in MYSQL: What are the drawbacks and advantages?


I have been trying to read data out of MySQL using Kafka connect using MySQL source connector for database and debezium connector for bin logs. I am trying to understand what could be the better way to pull the change data. Bin logs has overhead of writing to logs etc while reading from database has the overhead of querying database. What are the other major advantages and disadvantages that are associated with both of these approaches? What could be a better way of capturing change data? Also starting from MySQL 8 the bin logs are enabled by default. Does this mean it could be a better way of doing things?


Solution

  • This question can be summarized as follows:

    What are the pros and cons of a log-based CDC (represented by Debezium Connector) versus a polling-based CDC (represented by JDBC Source Connector)?

    Query-based CDC:

    • ✓ Usually easier to setup
    • ✓ Requires fewer permissions
    • ✗ Impact of polling the DB
    • ✗ Needs specific columns in source schema to track changes
    • ✗ Can't track deletes
    • ✗ Can't track multiple events between polling interval

    Log-based CDC:

    • ✓ All data changes are captured
    • ✓ Low delays of events while avoiding increased CPU load
    • ✓ No impact on data model
    • ✓ Can capture deletes
    • ✓ Can capture old record state and further meta data
    • ✗ More setup steps
    • ✗ Higher system previleges required
    • ✗ Can be expensive for some proprietary DB

    Reference:

    1. Five Advantages of Log-Based Change Data Capture by Gunnar Morling
    2. No More Silos: How to Integrate Your Databases with Apache Kafka and CDC by Robin Moffatt
    3. StackOverflow: Kafka Connect JDBC vs Debezium CDC