I want to read a parquet file with AvroParquetReader in Java. The file footer specifies the model class, e.g. com.bigcompany.model.Foo, but when the file was written the model object had one less column in it. The extra column was not added at the end of the model object but in the middle so the error I currently get is that it is trying to put an String value from the file and put it in an integer column in the model object.
For sake of argument, I cannot use an old version of the model or modify the file. I need to work with the existing file and the current model version and I need to read the file into the model object ignoring the additional field.
Are there options for AvroParquetReader that will allow me to do this?
My solution is to read the file as a GenericData record. This stops the automatic conversion to the model object when you use SpecificData.
After that, I can use reflection to grab the GenericData.Record object from inside the ParquetRecord
Field dataField = ParquetRecord.class.getDeclaredField("data");
ReflectionUtils.makeAccessible(field);
GenericData.Record genericDataRecord = (GenericData.Record) ReflectionUtils.getField(field, parquetRecord);
Here we can see the schema name to decide which model object we need to create
data.getSchema().getName();
and based on that, map it like so:
com.bigcompany.model.Foo modelFoo = new Foo();
genericDataRecord.getSchema().getFields().stream()
.forEach(f -> modelFoo.put(f.name(), genericDataRecord.get(f.pos()))));