Search code examples
mongodbapache-kafka-connectdebezium

KafkaConnect produces CDC event with null value when reading from mongoDB with debezium


When reading the kafka topic which contains lots of CDC events produced by Kafka-Connect using debezium and the data source is in a mongodb collection with TTL, I saw some of the CDC events are null, those are in between the deletion events. what does it really mean?

As I understand all the CDC events should have the CDC event structure, even the deletion events as well, why there are events with null value?

null,
{
  "after": null,
  "patch": null,
  "source": {
    "version": "0.9.3.Final",
    "connector": "mongodb",
    "name": "test",
    "rs": "rs1",
    "ns": "testestest",
    "sec": 1555060472,
    "ord": 297,
    "h": 1196279425766381600,
    "initsync": false
  },
  "op": "d",
  "ts_ms": 1555060472177
},
null,
{
  "after": null,
  "patch": null,
  "source": {
    "version": "0.9.3.Final",
    "connector": "mongodb",
    "name": "test",
    "rs": "rs1",
    "ns": "testestest",
    "sec": 1555060472,
    "ord": 298,
    "h": -2199232943406075600,
    "initsync": false
  },
  "op": "d",
  "ts_ms": 1555060472177
}

I use https://debezium.io/docs/connectors/mongodb/ without flattening any event, and use the config as follows:

{   
    "connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
    "mongodb.hosts": "live.xxx.xxx:27019",
    "mongodb.name": "testmongodb",
    "collection.whitelist": "testest",
    "tasks.max": 4,
    "snapshot.mode": "never",
    "poll.interval.ms": 15000
}

Solution

  • These are so-called tombstone events used for correct compaction of deleted events - see https://kafka.apache.org/documentation/#compaction

    Compaction also allows for deletes. A message with a key and a null payload will be treated as a delete from the log. This delete marker will cause any prior message with that key to be removed (as would any new message with that key), but delete markers are special in that they will themselves be cleaned out of the log after a period of time to free up space. The point in time at which deletes are no longer retained is marked as the "delete retention point" in the above diagram.