Search code examples
mysqldebeziumdebezium-engine

Where to begin with fixing an offsets issue with Debezium Engine


I'm using Debezium engine to sync data from a MySQL database. Since I'm using Debezium Engine I'm using the org.apache.kafka.connect.storage.FileOffsetBackingStore to record my current changes offset. I think my computer had a power outage recently which resulted in the corruption of my offset file. When I try to run my Debezium engine app now, I get this error from Debezium.

ERROR io.debezium.embedded.EmbeddedEngine - Unable to configure and start the 'org.apache.kafka.connect.storage.FileOffsetBackingStore' offset backing store
org.apache.kafka.connect.errors.ConnectException: java.io.StreamCorruptedException: invalid stream header: 00000000
    at org.apache.kafka.connect.storage.FileOffsetBackingStore.load(FileOffsetBackingStore.java:86)
    at org.apache.kafka.connect.storage.FileOffsetBackingStore.start(FileOffsetBackingStore.java:59)
    at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:691)
    at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:192)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.StreamCorruptedException: invalid stream header: 00000000
    at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:987)
    at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:414)
    at org.apache.kafka.connect.util.SafeObjectInputStream.<init>(SafeObjectInputStream.java:48)
    at org.apache.kafka.connect.storage.FileOffsetBackingStore.load(FileOffsetBackingStore.java:71)
    ... 8 common frames omitted

I'd like to fix this issue the proper way, by repairing the Debezium offset file but I don't know where to begin. I'm thinking I first need to figure out what offset I want, which would be the offset at which it was failing (or an offset before but near it). I may be able to get the offset from the corrupted file, but if not, can I use a timestamp to find a good offset to start at? It looks like I can use this tool to update the file to point to an offset of my choice (https://github.com/nathan-smit-1/HashmapEditor) once I know it. How, though, do I get a list of offsets in chronological order so I know which one I should change it to?


Solution

  • After a lot of trial and error and lots of online research I found the best solution to this was the following steps:

    1. Delete the corrupt offsets.dat file
    2. Start Debezium to generate a new, working offsets.dat file for this machine
    3. Use past Debezium logs to find an offset that Debezium processed recently (the more recent the better)
    4. Edit the offsets.dat file using a tool that can binary serialize the offset information you found in step 3. A hex editor might work depending on what you're changing, I don't know, I used a Java app to write the data to the file

    *-*-*-*-*-*-*-*-*-*-* UPDATE *-*-*-*-*-*-*-*-*-*-*

    Debezium now has an offset editor as part of their source code. You can find it in the link below and works really well. Follow steps 1,2 & 3 above as before and for step 4 use this editor instead.

    https://github.com/debezium/debezium-examples/tree/main/offset-editor