Search code examples
javamysqlapache-kafkaapache-kafka-connectdebezium

Debezium flush timeout and OutOfMemoryError errors with MySQL


Using Debezium 0.7 to read from MySQL but getting flush timeout and OutOfMemoryError errors in the initial snapshot phase. Looking at the logs below it seems like the connector is trying to write too many messages in one go:

WorkerSourceTask{id=accounts-connector-0} flushing 143706 outstanding messages for offset commit   [org.apache.kafka.connect.runtime.WorkerSourceTask]
WorkerSourceTask{id=accounts-connector-0} Committing offsets   [org.apache.kafka.connect.runtime.WorkerSourceTask]
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space
WorkerSourceTask{id=accounts-connector-0} Failed to flush, timed out while waiting for producer to flush outstanding 143706 messages   [org.apache.kafka.connect.runtime.WorkerSourceTask]

Wonder what the correct settings are http://debezium.io/docs/connectors/mysql/#connector-properties for sizeable databases (>50GB). I didn't have this issue with smaller databases. Simply increasing the timeout doesn't seem like a good strategy. I'm currently using the default connector settings.

Update

Changed the settings as suggested below and it fixed the problem:

OFFSET_FLUSH_TIMEOUT_MS: 60000  # default 5000
OFFSET_FLUSH_INTERVAL_MS: 15000  # default 60000
MAX_BATCH_SIZE: 32768  # default 2048
MAX_QUEUE_SIZE: 131072  # default 8192
HEAP_OPTS: '-Xms2g -Xmx2g'  # default '-Xms1g -Xmx1g'

Solution

  • This is a very complex question - first of all, the default memory settings for Debezium Docker images are quite low so if you are using them it might be necessary to increase them.

    Next, there are multiple factors at play. I recommend to do follwoing steps.

    1. Increase max.batch.size and max.queue.size - reduces number of commits
    2. Increase offset.flush.timeout.ms - gives Connect time to process accumulated records
    3. Decrease offset.flush.interval.ms - should reduce the amount of accumulated offsets

    Unfortunately there is an issue KAFKA-6551 lurking in backstage that can still play a havoc.