Search code examples
apache-kafkaavroconfluent-schema-registry

Kafka topic with different format of data


I have written some avro data to the topic “test-avro” using Kafka-avro-console-producer. Then I have written some plain text data to the same topic “test-avro” using Kafka-console-producer. After this, all the data in the topic got corrupted. Can anyone explain what caused this to happen like this?


Solution

  • You simply cannot use the avro-console-consumer (or a Consumer with an Avro deserializer) anymore to read those offsets because it'll assume all data in the topic is Avro and use Confluent's KafkaAvroDeserializer.

    The plain console-producer will push non-Avro encoded UTF-8 strings and use the StringSerializer, which will not match the wire format expected for the Avro deserializer

    The only way to get past them is to know what offsets are bad, and wait for them to expire on the topic, or reset a consumer group to begin after those messages. Or, you can always use the ByteArrayDeserializer, and add a bunch of conditional logic for parsing your messages to ensure no data-loss.

    tl;dr The producer and consumer must agree on the data format of the topic.