I'm exporting a database with debezium, previously I tested this setup and it worked correctly (with about 1% of production data), but in a production setting I'm getting mismatch between row counts in database and the counts of messages that debezium exported.
E.g. I have a table db.large
which has ~259 million entries, but debezium only exported 200 million. For some other tables, I'm getting more messages exported by debezium than actually present in the table (this is just during initial snapshot). For a small table with just 542 entries, the counts match.
I see some Failed to flush
and Failed to commit offsets
messages in logs, but they do not occur for all offset flushes - some are successful. Could these flush/commit failures be the reason for mismatch?
I'm using MySQL connector with debezium 1.7.
Here are partial logs demonstrating the mismatch:
INFO || WorkerSourceTask{id=connector-v1-0} flushing 5722 outstanding messages for offset commit
ERROR || WorkerSourceTask{id=connector-v1-0} Failed to flush, timed out while waiting for producer to flush outstanding 211 messages
ERROR || WorkerSourceTask{id=connector-v1-0} Failed to commit offsets
INFO MySQL|connector_v1|snapshot Exported 201944873 of 259000000 records for table 'db.large' after 10:09:38.853
INFO MySQL|connector_v1|snapshot Exported 202002217 of 259000000 records for table 'db.large' after 10:09:49.062
INFO MySQL|connector_v1|snapshot Exported 202057513 of 259000000 records for table 'db.large' after 10:09:59.281
INFO MySQL|connector_v1|snapshot Exported 202112809 of 259000000 records for table 'db.large' after 10:10:09.488
INFO MySQL|connector_v1|snapshot Exported 202168105 of 259000000 records for table 'db.large' after 10:10:19.669
INFO MySQL|connector_v1|snapshot Exported 202221353 of 259000000 records for table 'db.large' after 10:10:30.152
INFO || WorkerSourceTask{id=connector-v1-0} flushing 5788 outstanding messages for offset commit
INFO MySQL|connector_v1|snapshot Exported 202278697 of 259000000 records for table 'db.large' after 10:10:40.334
ERROR || WorkerSourceTask{id=connector-v1-0} Failed to flush, timed out while waiting for producer to flush outstanding 561 messages
ERROR || WorkerSourceTask{id=connector-v1-0} Failed to commit offsets
INFO MySQL|connector_v1|snapshot Exported 202336041 of 259000000 records for table 'db.large' after 10:10:50.352
INFO MySQL|connector_v1|snapshot Finished exporting 202353026 records for table 'db.large'; total duration '10:10:53.191'
INFO MySQL|connector_v1|snapshot Exporting data from table 'db.small' (2 of 7 tables)
INFO MySQL|connector_v1|snapshot For table 'db.small' using select statement: 'SELECT `field1`, `field2`, `field3` FROM `db`.`small`'
INFO MySQL|connector_v1|snapshot Finished exporting 500 records for table 'db.small'; total duration '00:00:00.021'
INFO MySQL|connector_v1|snapshot Exporting data from table 'db.medium' (3 of 7 tables)
INFO MySQL|connector_v1|snapshot For table 'db.medium' using select statement: 'SELECT `field1`, `field2`, `field3` FROM `db`.`medium`'
INFO MySQL|connector_v1|snapshot Exported 84873 of 14000000 records for table 'db.medium' after 00:00:10.006
INFO MySQL|connector_v1|snapshot Exported 170889 of 14000000 records for table 'db.medium' after 00:00:20.172
INFO MySQL|connector_v1|snapshot Exported 258953 of 14000000 records for table 'db.medium' after 00:00:30.267
INFO MySQL|connector_v1|snapshot Exported 349065 of 14000000 records for table 'db.medium' after 00:00:40.392
Any thoughts? Thanks
Figured this out - the number of messages exported was actually correct.
The answer is that debezium does not use actual message count in those logs, but an estimated count: https://github.com/debezium/debezium/blob/8d71080a9a8aac875e338964af417dc8de93dfcc/debezium-connector-mysql/src/main/java/io/debezium/connector/mysql/MySqlConnection.java#L427