Search code examples
apache-kafkaapache-kafka-connectdebezium

Kafka Connect best practices for topic compaction


I am using Debezium which makes of Kafka Connect. Kafka Connect exposes a couple of topics that need to be created:

OFFSET_STORAGE_TOPIC This environment variable is required when running the Kafka Connect service. Set this to the name of the Kafka topic where the Kafka Connect services in the group store connector offsets. The topic should have many partitions, be highly replicated (e.g., 3x or more) and should be configured for compaction.

STATUS_STORAGE_TOPIC This environment variable should be provided when running the Kafka Connect service. Set this to the name of the Kafka topic where the Kafka Connect services in the group store connector status. The topic can have multiple partitions, should be highly replicated (e.g., 3x or more) and should be configured for compaction.

Does anyone have any specific recommended compaction configs for these topics?

e.g.

is it enough to set just:

cleanup.policy: compact

unclean.leader.election.enable: true

or also:

min.compaction.lag.ms: 60000

segment.ms: 1800000

min.cleanable.dirty.ratio: 0.01

delete.retention.ms: 100

Solution

  • The defaults should be fine, and Connect will create/configure those topics on its own unless you preconfigure those topics with those settings.

    These are the only cases when I can think of when to adjust the compaction settings

    1. a connect-group lingering on the topic longer than you want it to be. For example, a source connector doesn't start immediately after a long downtime because it's processing the offsets topic
    2. your Connect cluster doesn't accurately report its state, or the tasks do not rebalance appropriately (because the status topic is in a bad state)

    The __consumer_offsets (compacted) topic is what is used for Sink connectors, and would be configured separately for all consumers, not only Connect