Search code examples
apache-kafkaapache-kafka-connectdebezium

how to distribute messages to all partitions in topic defined by `offset.storage.topic` in kafka connect


I have deployed debezium using the docker image pulled from docker pull debezium/connect

In the documentation provided at https://hub.docker.com/r/debezium/connect the description for one of the environment variable OFFSET_STORAGE_TOPIC is as follows:

This environment variable is required when running the Kafka Connect service. Set this to the name of the Kafka topic where the Kafka Connect services in the group store connector offsets. The topic must have a large number of partitions (e.g., 25 or 50), be highly replicated (e.g., 3x or more) and should be configured for compaction.

I've created the required topic named mydb-connect-offsets with 25 partitions and replication factor of 5.

The deployment is successful and everything is working fine. A sample message in mydb-connect-offsets topic looks like this. The key is ["sample-connector",{"server":"mydatabase"}] and value is

{
   "transaction_id": null,
   "lsn_proc": 211534539955768,
   "lsn_commit": 211534539955768,
   "lsn": 211534539955768,
   "txId": 709459398,
   "ts_usec": 1675076680361908
}

As the key is fixed, all the messages are getting to the same partition of the topic. My question is why does the documentation says that the topic must have a large number of partitions when only one partition is going to be used eventually? Also, what needs to be done to distribute the messages across all partitions?


Solution

  • The offsets are keyed by connector name because they must be ordered.

    The large partition count is to manage offset storage of many distinct connectors in parallel, not only one.