Search code examples
c#apache-kafkaavro

Deserialize the AVRO in c#


I have a Schema which is written in JSON format. And I get a string from kafka server which looks like:

\0\0\0\u00032H45d71580-9781-4d9c-8535-a233ff7c3122\nPLANTH45d71580-9781-4d9c-8535-a233ff7c3122\nPLANT,2017-12-12T16:34:15GMT\u001020171212\u0018201712121034\nthertH1AB5297A-9D28-4742-A95C-4A4CEED7037D\nfalse\nfalse\ncross\u00021\u00025

Now I try to deserialize the string and make it to a Object based on my Schema file. How can I do that in c#? Is there any library I can use?

I tried Microsoft.Hadoop.Avro. https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-dotnet-avro-serialization#Scenario1 Once the code run to:

var actual = avroSerializer.Deserialize(buffer);

it will throw a exception: "array dimensions exceeded supported range"

I get the string from kafka. Another app produce it and my app consume it. The app produce it is written in swift and they use some nodejs lib to do serialize. So I guess if the string's format matter?

The kafka message is produced by a Javascript app. They serialize the string by using a Library called AVSC (Avro for Javascript). Once I get the message (a string) I convert it into a byte stream, after that I found this byte is a little bit different than the original one generated by AVSC lib. But why?


Solution

  • Confluent's Java library (which I suspect is what the Swift app is using to write to Kafka) writes a magic byte when they serialize to Avro's binary encoding. See this article: https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html#wire-format

    They use it for versioning and backwards compatibility, which is detailed here: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets

    However, the Microsoft.Hadoop.Avro library you are using does not use a magic byte when it de/serializes. Try removing the first byte from the stream before calling Deserialize().