apache-kafka apache-flink aws-glue flink-streaming confluent-schema-registry

FlinkKafkaConsumer / KafkaSource with AWS Glue Schema Registry or Confluent Schema Registry

I'm trying to write an Flink streaming application that has a KafkaSource to read from a topic which has an AVRO schema defined for its data.

I would like to know how the automatic caching of schemas locally works in this case similar to Confluent's documentation here.

Basically, the use-case is that a consumer should not know the schema beforehand. Once the consumer is instantiated, the schema registry URL should be taken as a parameter and the consumer should read the schema for that particular topic.

Is this possible? Any pointers are appreciated!

Solution

The AWS SerDe libraries for Glue use a wire format that containes the uuid of the schema (version) the message is serialized with. The consuming application reads the schema id from the message, and loads it from the Glue schema registry, if it's not in the local cache already. You can find a description of the wire format at the bottom of the readme for this javascript serde library: https://github.com/meinestadt/glue-schema-registry .