Search code examples
apache-kafkaconfluent-schema-registry

How does the Confluent platform enforce schemas in Kafka?


From time to time I come across Confluent's Schema Registry and its schema validation functionality.

According to the documentation this works by setting confluent.value.schema.validation or confluent.key.schema.validation on a topic to true. If I now use a tool like kafka-console-producer to send an invalid message, the broker rejects this message.

How does that actually work? After all, kafka-console-producer is part of Apache Kafka and not (directly) related to Confluent.

I'm neither aware of any Kafka plugin API that would support something like that nor could I find anything at Confluent's github page. I guess Confluent Platform somehow hides intermediate topics to enable this, but then I still don't know how they trigger an error in kafka-console-producer.


Solution

  • Server side schema validation feature is only available on Confluent Server, not Apache Kafka, which is a closed source, enterprise build of Kafka.

    But the basic idea would be to introduce a feature flag in the source code (Confluent maintains their own Kafka fork), and intercept the network request server-side to deserialize every record before it's written to disk, then verify it against a schema.

    don't know how they trigger an error in kafka-console-producer

    I personally haven't used the feature, so not sure what error you're referring to, but they could also forge a custom Kafka protocol network response