Search code examples
apache-kafkadb2confluent-platformcdcibm-data-replication

IIDR CDC Kafka message format


We are sending table data from db2 to iidr-cdc to kafka . We have a trouble with format of data in Kafka topic when you see the messages in kafka-avro-console-consumer .

For Db2 columns defined as DEFAULT NULL if their value is null, it looks fine in kafka topic (as key:value) . BUT when the value is not null , it is wrapped in a dictionary .

Example Output if column is -

"Random_key": {
    "int": 9088245671
  }

Here, the key of that entry is the datatype of the column and the value is column value. --> This kind of output format is undesirable for our application

If the value is actually null and column defined as DEFAULT NULL , it looks fine . Just as expected -

 "Random_key": null 

How can we make the changes either in IIDR CDC or KAFKA side to always display the message in key:value format , like this - (even if DEFAULT NULL column contains some value in column)

"Random_key": 9088245671

Thanks!


Solution

  • It's normal, it means that the field Random_key is an avro record of type Union. With an union type you have to set a default value that match the type of the union and in your case your CDC is interpreted the database field schema constraint as an union { null, int}.

    When the field is not null, it means that it's an integer and in avro when it's an union you have to specify what is the according type. Imagine if you have this : union {string, int, double}. Here the field is correct when it's a string, an integer or a double, but we want to know for each field what is the real type of that data.

    Unfortunately it's the correct behavior but normally you don't care about that. avro-console-consumer use a json serializer to print the data for you to be able to read it. In your code the field data type will be correctly interpreted like you want them to be.

    EDIT : If you business need absolutely a record in json format, there is a guy that wanted to change the representation in more readeable json and developed a set of encoder/decoder, to use instead of the default :

    https://github.com/zolyfarkas/avro/commit/8926d6e9384eb3e7d95f05a9d1653ba9348f1966