I am trying to understand the structure of the message in the offsets' topic used by Debezium. We are trying to add a new message to the offsets' topic so that when debezium restarts, it picks from our desired binlog file and position. So, we want to understand the structure of the json message.
$ kcat -b localhost:9092 -C -t debezium-events-offset-local-topic-events2 -f 'Partition(%p) %k %s\n' -K '|' -O
Partition(0) ["debezium_mysql_connector",{"server":"debezium-cdc-events2"}] {"transaction_id":null,"ts_sec":1698175034,"file":"mysql_bin_log.008984","pos":219,"row":148,"server_id":1,"event":12}
Partition(0) ["debezium_mysql_connector",{"server":"debezium-cdc-events2"}] {"transaction_id":null,"ts_sec":1698175040,"file":"mysql_bin_log.008984","pos":2806147,"row":38,"server_id":1,"event":7}
Partition(0) ["debezium_mysql_connector",{"server":"debezium-cdc-events2"}] {"transaction_id":null,"ts_sec":1698175045,"file":"mysql_bin_log.008984","pos":5757850,"row":61,"server_id":1,"event":28}
Partition(0) ["debezium_mysql_connector",{"server":"debezium-cdc-events2"}] {"transaction_id":null,"ts_sec":1698175050,"file":"mysql_bin_log.008984","pos":8548701,"row":148,"server_id":1,"event":12}
Let's take one of the messages:
Key: ["debezium_mysql_connector",{"server":"debezium-cdc-events2"}]
Value: {"transaction_id":null,"ts_sec":1697687362,"file":"mysql-bin-changelog.001433","pos":0,"row":0,"server_id":1,"event":0} <--- what's the meaning of various attributes here?
What is the meaning of the various attributes in the Value? What is the difference between pos & row? Is server_id always 1? What is the meaning of event?
So here's the breakdown of what each of those values represent:
"transaction_id": represents the current active transaction. If there are none, it is "null".
"ts_ms": represents the time of the most recently seen event.
"file": represents the current active binlog file being parsed.
"pos": represents the current position within the current binlog file
"row": represents the granular offset at the binlog position, more on this later.
"server_id": this is the server id from the binlog event, this is probably 1 because that's the server id assigned to your MySQL instance.
"event": the number of events from the binlog position that should be skipped upon restart.
So in effect, where the connector restarts from is a combination of several fields:
You can think of it this way, upon restart we move the connector position to the "pos" within the specified "file". From here, we then begin reading events in the stream, skipping up to the specified number of "events". From this point, we begin reading the row elements within that event, advancing up to the specified "row" count before we begin sending changes from the binlog.
So generally if you want to roll back to a specific position, generally you would set the "file" and "pos" values to that desired position, setting row and events both to 0.