At my company we are using Avro in our Kafka workflow. Within this workflow we end up using binary serialization/deserialization to alert our API with a successful or failure object. What we've found is that with specific data conditions the Avro serialized object is able to be deserialized into the wrong object type. In our scenario we have a serialized success object that is able to deserialize into the failed object type. This is causing issues on within our API as the incorrect type is being deserialized.
I am relatively new to using avro and in reading their documentation found here, it seems like this could be a result of how their binary serialization operates.
I wrote a sample dotnet 6 console application that is able to replicate this issue.
Packages:
Avro Schema
{
"type": "record",
"namespace": "BinarySerializationTests.Schema",
"name": "FailedObject",
"fields": [
{
"name": "Id",
"doc": "Identifier",
"type": "string"
},
{
"name": "ErrorCode",
"doc": "The failure's error code",
"type": "string"
},
{
"name": "Errors",
"doc": "Collection of all errors",
"type": {
"type": "array",
"items": "string"
}
}
]
}
{
"type": "record",
"namespace": "BinarySerializationTests.Schema",
"name": "SuccessObject",
"fields": [
{
"name": "Id",
"type": "string"
},
{
"name": "RequiredProperty1",
"type": "string"
},
{
"name": "RequiredProperty2",
"type": "string"
},
{
"name": "FirstName",
"type": "string"
},
{
"name": "LastName",
"type": "string"
},
{
"name": "Suffix",
"type": [ "null", "string" ]
},
{
"name": "ExtraProperty1",
"type": [ "null", "string" ]
},
{
"name": "ExtraProperty2",
"type": [ "null", "string" ]
},
{
"name": "ExtraProperty3",
"type": [
"null",
{
"type": "int",
"logicalType": "date"
}
]
},
{
"name": "Email",
"type": "string"
},
{
"name": "CellPhone",
"type": [ "null", "string" ]
},
{
"name": "ExtraProperty4",
"type": [ "null", "string" ]
},
{
"name": "ExtraProperty5",
"type": [ "null", "string" ]
},
{
"name": "ExtraProperty6",
"type": [ "null", "string" ]
},
{
"name": "ExtraProperty7",
"type": [ "null", "string" ]
},
{
"name": "Address1",
"type": [ "null", "string" ]
},
{
"name": "Address2",
"type": [ "null", "string" ]
},
{
"name": "City",
"type": [ "null", "string" ]
},
{
"name": "State",
"type": [ "null", "string" ]
},
{
"name": "Zip",
"type": [ "null", "string" ]
},
{
"name": "OfficePhone",
"type": [ "null", "string" ]
},
{
"name": "ExtraProperty8",
"type": [ "null", "string" ]
},
{
"name": "TimeZone",
"type": "string"
},
{
"name": "ExtraProperty9",
"type": [ "null", "string" ]
}
]
}
The c# classes were generated with the avrogen tool. I can provided if needed, they are just generated from the schema above.
Test Console Code:
using Avro.IO;
using Avro.Specific;
using BinarySerializationTests.Schema;
static byte[] Serialize<T>(T thisObj) where T : ISpecificRecord
{
using (MemoryStream memoryStream = new MemoryStream())
{
BinaryEncoder binaryEncoder = new BinaryEncoder((Stream)memoryStream);
new SpecificDefaultWriter(thisObj.Schema).Write<T>(thisObj, (Encoder)binaryEncoder);
return memoryStream.ToArray();
}
}
static T Deserialize<T>(byte[] bytes) where T : ISpecificRecord, new()
{
using (MemoryStream memoryStream = new MemoryStream(bytes))
{
BinaryDecoder binaryDecoder = new BinaryDecoder((Stream)memoryStream);
T thisObj = new T();
new SpecificDefaultReader(thisObj.Schema, thisObj.Schema).Read<T>(thisObj, (Decoder)binaryDecoder);
return thisObj;
}
}
static bool TryDeserialize<T>(byte[] bytes, out T obj) where T : ISpecificRecord, new()
{
obj = default(T);
try
{
obj = Deserialize<T>(bytes);
return true;
}
catch (Exception ex)
{
return false;
}
}
var success = new SuccessObject
{
Id = Guid.NewGuid().ToString(),
RequiredProperty1 = "RD",
RequiredProperty2 = "400000",
FirstName = "FirstName",
LastName = "LastN",
Suffix = "ABC",
ExtraProperty1 = null,
ExtraProperty2 = null,
Email = "rdlastn@email.com",
CellPhone = "",
ExtraProperty4 = null,
ExtraProperty5 = null,
ExtraProperty6 = "",
ExtraProperty7 = "",
Address1 = "1 Wilson Road",
Address2 = "",
City = "Annapolis",
State = "MD",
Zip = "21402-0001",
OfficePhone = "",
ExtraProperty8 = "random",
TimeZone = "Eastern Standard Time",
ExtraProperty9 = null
};
var serializedSuccess = Serialize(success);
if (TryDeserialize(serializedSuccess, out FailedObject failedObject))
{
Console.WriteLine("Deserialized as FailedObject");
}
if (TryDeserialize(serializedSuccess, out SuccessObject successObject))
{
Console.WriteLine("Deserialized SuccessObject");
}
I found that the data above for the "SuccessObject" type is quite finicky as to what would cause it to be deseriliazed as the "FailedObject" type.
For instance if I change the property "Suffix" from "ABC" -> "A" it no longer will deserialize as the "FailedObject" type.
Alternatively, if I change the property "RequiredProperty2" from "400000" -> "500000" it no longer will deserialize as the "FailedObject" type.
When it does deserialize the "SuccessObject" as the "FailedObject" type the deserialization is putting values into the properties based on the order in the object which I think matches the Avro documentation.
For Example:
Does anyone have some insight into why this issue is occurring? Or if anyone has any potential workarounds?
The only solution I could think of was to combine the types into a "response" type that includes a "success" bool to determine success or failure.
Unless anyone has a better answer, we used the following solution.
Since the Avro Binary Encoding doesn't contain property name information it is just assigning the values out in order as long as types match. In our example above this meant that the List we have was able to fill in properties inside the List until it reached the first property that wasn't a string (ExtraProperty3).
So understanding this better, our solution was to instead of having two possible schema's as an outcome we combined them into a single schema that would tell us if it succeeded/failed and any related data. This way we are only ever deserializing the binary data into one type.