In Avro IDL I have a Message record defined as follows:
record Message{
MessageId id;
array<string> dataField;
}
I am using this record in another record with a null union:
record Invoice{
...
union {null,array<Message>} message;
}
We have a Java Kafka consumer (we're using Confluent Platform) that is using the avro-maven-plugin
version 1.10.2, configured with <stringType>String</stringType>
When we are making a call such as this:
List<String> msgList = message.getDataField();
for (String msg : msgList) {...}
we receive the following error on the second line: class org.apache.avro.util.Utf8 cannot be cast to class java.lang.String
Previously, the Invoice object was defined as:
record Invoice{
...
array<Message> message;
}
and we did not receive this error. We have found that in our schema file, changing from
"name" : "dataField",
"type" : {
"type" : "array",
"items" : "string"
}
to
"name" : "dataField",
"type" : {
"type" : "array",
"items" :{
"type": "string",
"avro.java.string" : "String"
}
}
corrects the problem.
I'm unclear as to why adding the union caused this change in behavior. Should I declare all of the strings in the schema with the avro.java.string
and if so, how do I do that with Avro IDL?
At this point, there appears to be a couple of ways to resolve this, at least when using the Confluent Platform, version 5.5.1 or later. And I'm considering the problem to be an open defect with Avro.
The first option is to update the Avro Schema file with a global search and replace of "type":"string"
to
"type": {
"avro.java.string": "String",
"type": "string"
}
This first option would need to be done after creating any files via Avro IDL since it doesn't support this construct, making IDL less useful in this case. Strangely, this approach does not appear to impact records that come in via REST Proxy that have "type":"string"
associated without the additional avro.java.string
information. They appear able to use a schema defined in either way; I was expecting the updated schema with the avro.java.string
information to cause problems with REST Proxy requests that don't have that detail.
The second option is to set auto.register.schemas=false
and use.latest.version=true
, though this may cause unintended consequences with compatibility in the future.
The third option is to just not use the <stringType>
directive in the Maven configuration for Avro Tools. This means a lot of coding around the CharacterSequence
that is used by default, usually in the form of .toString()
methods.