Search code examples
apache-kafkaapache-kafka-connectdebezium

Debezium connector for MySQL. The db history topic is missing


I'm using the Debezium connector version 0.8 to capture the changes from a MySQL database and move it to Kafka. I'm using Docker with a container for MySQL, another one for the connector, and another one for Kafka.

When I stop Docker (docker-compose down) and I start Docker one more time, I usually get the following error:

org.apache.kafka.connect.errors.ConnectException: The db history topic is missing. You may attempt to recover it by reconfiguring the connector to SCHEMA_ONLY_RECOVERY

I have read the solution for this issue on the official page here:

https://debezium.io/blog/2018/03/16/note-on-database-history-topic-configuration/

But I followed those steps and I think my configuration is ok:

log.retention.bytes = -1
log.retention.hours = 168       
log.retention.minutes = null
log.retention.ms = -1

Note that if I set log.retention.ms to -1 then log.retention.minutes and log.retention.hours won't be used like the official documentation explains, and then I have solved the retention size and retention time problems.

So, does anybody know why I'm getting this error?

This is a part of university work. I think I cannot share the complete docker-compose file before I publish it at my university, but I can show you the important things related to this problem. I don't think this is a configuration problem because I have nothing special in my docker-compose.

mysql:
    image: mysql/5.7:configured (Little changes like enabling queries...)
environment:
     - MYSQL_ROOT_PASSWORD=debezium
     - MYSQL_USER=mysqluser
     - MYSQL_PASSWORD=mysqlpw
    volumes:
     - "sql_Data:/var/lib/mysql"
     - "sql_LogError:/var/log/mysql"

kafka:
    image: debezium/kafka:0.8
    depends_on:
     - zookeeper
    environment:
     - HOST_NAME=xxxx
     - ADVERTISED_HOST_NAME=xxxx
     - ZOOKEEPER_CONNECT=zookeeper:2181
     - KAFKA_CREATE_TOPICS="events:1:1"
     - KAFKA_LOG_RETENTION_MS=-1
    volumes:
          - "kafka_Data:/kafka/data" 
          - "kafka_Log:/kafka/logs"
          - "kafka_Conf:/kafka/config"

connect:
    image: debezium/connect:0.8
    depends_on:
     - zookeeper
     - kafka
     - mysql
    environment:
     - HOST_NAME=xxxx
     - ADVERTISED_HOST_NAME=xxxx
     - BOOTSTRAP_SERVERS=xxxx:9092
     - GROUP_ID=1
     - CONFIG_STORAGE_TOPIC=my_connect_configs
     - OFFSET_STORAGE_TOPIC=my_connect_offsets
     - STATUS_STORAGE_TOPIC=my_connect_statuses
volumes: 
  sql_Data:
  sql_LogError:
  kafka_Data:
  kafka_Log:
  kafka_Conf:

And the other parts are only networks or not relevant things.


Solution

  • Finally, after struggling with this problem during a lot of days I found the cause of the problem and the solution.

    There is an errata in the documentation of the debezium/zookeeper image. As you can see in this link:

    link to debezium/zookeeper image in dockerHub

    The documentation establishes 3 volumes to save all the data zookeeper needs. The paths to these volumes are:

    1. /zookeeper/data
    2. /zookeeper/logs
    3. /zookeeper/conf

    The problem here is the second one is wrong. According to its Dockerfile, the path to the second one, which is used to save the transaction log, must be:

    /zookeeper/txns

    Here is a snippet of its Dockerfile.

    # Expose the ports and set up volumes for the data, transaction log, and configuration
    EXPOSE 2181 2888 3888
    VOLUME ["/zookeeper/data","/zookeeper/txns","/zookeeper/conf"]