Search code examples
javaschemaavro

How to get Avro schema validation to support field aliases?


I have renamed a field in a record serialised with Avro. I want to support reading the old versions of the data, without a schema registry. Therefore I keep all versions of the schema as resources loaded from the classpath.

This works great and supports schema evolution. I can read data serialised with the old schemas when they are backward compatible. As part of ensuring this I want to verify the schemas when the application starts. Unfortunately the schema validation does not honour field aliases, even when the decoding data does.

Here is a simple example that proves my point:

import java.util.Collections;

import org.apache.avro.Schema;
import org.apache.avro.SchemaBuilder;
import org.apache.avro.SchemaValidationException;
import org.apache.avro.SchemaValidatorBuilder;


public class Bar {
    public static void main(String[] args) throws SchemaValidationException {
        Schema stringType = SchemaBuilder.builder().stringType();
        Schema s1 = SchemaBuilder.builder().record("foo").fields()
                .name("test1").type(stringType).noDefault()
                .endRecord();
        Schema s2 = SchemaBuilder.builder().record("foo").fields()
                .name("test2").aliases("test1").type(stringType).noDefault()
                .endRecord();

        new SchemaValidatorBuilder().canReadStrategy().validateLatest().validate(s2, Collections.singleton(s1));

    }
}

This throws the following exception:

Exception in thread "main" org.apache.avro.SchemaValidationException: Unable to read schema: 
{
  "type" : "record",
  "name" : "foo",
  "fields" : [ {
    "name" : "test1",
    "type" : "string"
  } ]
}
using schema:
{
  "type" : "record",
  "name" : "foo",
  "fields" : [ {
    "name" : "test2",
    "type" : "string",
    "aliases" : [ "test1" ]
  } ]
}
    at org.apache.avro.ValidateMutualRead.canRead(ValidateMutualRead.java:70)
    at org.apache.avro.ValidateCanRead.validate(ValidateCanRead.java:40)
    at org.apache.avro.ValidateLatest.validate(ValidateLatest.java:51)
    at Bar.main(Bar.java:18)

Solution

  • Sorry to answer my own question:

    I discovered a variation of this asked but not answered on the Arvo user mailing list. Different behavior between SchemaValidator and SchemaCompatibility regarding aliased field

    To me it looks like SchemaValidator has a bug, yet I don't get why there's both SchemaValidator and SchemaCompatibility so it feels like I'm missing something.

    In short use SchemaCompatibility.checkReaderWriterCompatibility instead of SchemaValidatorBuilder it seams to be more complete and reuses the decoding logic.