Search code examples
apache-kafkaavroconfluent-schema-registry

Should Avro be used to on both the key and value in Kafka?


We're working to setup a Kafka cluster and exploring the use of Avro but I haven't been able to find guidance on if Avro should be used on both the key and value of a Kafka message. I've explored both use cases and I'm not really seeing the benefit of applying AVRO at the key level. Any good reasons to do so? And a follow-up if not using AVRO on the key what is the preferred converter (String, JSON, etc)?


Solution

  • If Avro should be used on both the key and value of a Kafka message

    That is upto how you are going to use the key. Keys are usually (not always) a single field, possibly a String or a number but not complex objects. That being the case, there is no reason to use Avro format for them.

    I've explored both use cases and I'm not really seeing the benefit of applying AVRO at the key level

    You can evaluate the usage by considering what you will be putting in as key. If value is what you are mostly concerned about and that a simple string or a number would be enough to differentiate (or) classify your Kafka messages, you don't require Avro format.

    Sometimes, there can be use-cases where multiple fields make up a key, just like we create a primary key in RDBMS out of multiple columns. If you suppose that your application has (or can have) such a use-case, then use avro in order to support schema evolution.

    And a follow-up if not using AVRO on the key what is the preferred converter (String, JSON, etc)

    JSON and Avro are for complex objects like your custom POJOs, whilst others like String, Long etc are for single field values.

    If I want to stream a user information which can be identified by a user_id, then user_id will be my Kafka message key. In such a case we can use String or Long.

    Avro has a compact binary format. For more on why use Avro for Kafka see this article.