Search code examples
javaavrohortonworks-data-platform

Reading Avro files in Java application with Hortonworks Schema Registry


I have an application that is writing files in Avro format (multiple records per file) but I cannot read it in another Java app. Here's what I've tried

Map<String, Object> registryConfig = new HashMap<>();
registryConfig.put("schema.registry.client.class.loader.cache.size", 10L);
registryConfig.put("schema.registry.url", "http://localhost:9090/api/v1");
registryConfig.put("schema.registry.client.class.loader.cache.expiry.interval.secs", 10L);
registryConfig.put("schema.registry.deserializer.schema.cache.size", 10L);
registryConfig.put("schema.registry.client.schema.metadata.cache.size", 10L);
registryConfig.put("schema.registry.client.schema.text.cache.expiry.interval.secs", 10000L);
registryConfig.put("schema.registry.client.schema.version.cache.expiry.interval.secs", 10000L);
registryConfig.put("schema.registry.client.schema.metadata.cache.expiry.interval.secs", 10L);
registryConfig.put("specific.avro.reader", false);
registryConfig.put("schema.registry.client.schema.version.cache.size", 10L);
registryConfig.put("schema.registry.client.schema.version.text.size", 10L);
registryConfig.put("schemaregistry.deserializer.schema.cache.expiry.secs", 10000L);

SchemaRegistryClient registryClient = new SchemaRegistryClient(registryConfig);

AvroSnapshotDeserializer deserializer = new AvroSnapshotDeserializer(registryClient);
deserializer.init(registryConfig);

Path p = Paths.get("/tmp/dump.avro");
InputStream is = Files.newInputStream(p);
deserializer.deserialize(is);

But it throws

Exception in thread "main" com.hortonworks.registries.schemaregistry.serdes.avro.exceptions.AvroException: Unknown protocol id [79] received while deserializing the payload
  at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.checkProtocolHandlerExists(AvroSnapshotDeserializer.java:70)
  at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.retrieveProtocolId(AvroSnapshotDeserializer.java:63)
  at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.retrieveProtocolId(AvroSnapshotDeserializer.java:32)
  at com.hortonworks.registries.schemaregistry.serde.AbstractSnapshotDeserializer.deserialize(AbstractSnapshotDeserializer.java:141)
  at com.hortonworks.registries.schemaregistry.serde.AbstractSnapshotDeserializer.deserialize(AbstractSnapshotDeserializer.java:55)
  at com.hortonworks.registries.schemaregistry.serde.SnapshotDeserializer.deserialize(SnapshotDeserializer.java:60)

I know it would be difficult for you to reproduce this problem as it requires my schema registry and a file. I hope though, that I am doing something silly here. Any help would be appreciated.


Solution

  • Okay... I've realized that 79 from the error message is ASCII code of the letter O. I then double checked if my files are REALLY using schema registry - it turns out they don't. They are just Avro files with embedded schema. Thus, I don't need Hortonworks' AvroSnapshotDeserializer - simple DataFileReader will do.