I am currently trying to generate a schema using the Avro Generator from Jackson. I'm surprised that there are many Java classes in the schema.
I'm not an avro expert, but the format looks kind of weird and not correct
DTO:
public class TestDto {
@JsonProperty( "id" )
private Long id;
@JsonProperty( "labels" )
@Valid
private Set<String> labels;
}
Mapper:
private AvroSchema generateAvroSchema( ) throws JsonMappingException {
AvroMapper avroMapper = new AvroMapper();
final AvroSchemaGenerator gen = new AvroSchemaGenerator();
avroMapper.acceptJsonFormatVisitor( TestDto.class, gen );
return gen.getGeneratedSchema();
}
Schema:
{
"type": "record",
"name": "TestDto",
"namespace": "test.model",
"fields": [
{
"name": "id",
"type": [
"null",
{
"type": "long",
"java-class": "java.lang.Long"
}
]
},
{
"name": "labels",
"type": [
"null",
{
"type": "array",
"items": "string",
"java-class": "java.util.Set"
}
]
}
]
}
Can this scheme be used at all by other languages such as Python?
The "java-class" entry is used to determine which Java implementation class the field corresponds to when a schema is generated dynamically. Other languages will generally ignore that information, so the schema should be usuable by other languages.
The Avro guide doesn't seem to describe this anywhere, however it is mentioned in the Avro reflection Javadocs
Collection implementations are mapped to Avro array schemas with the "java-class" property set to the collection implementation, e.g.:
{"type": "array", "java-class": "java.util.ArrayList"}
Also the IDL language has a direct counterpart, that's documented here.