Search code examples
javaspring-bootgradleavroavro-tools

Avro - java.io.IOException: Not a data file


I am using https://github.com/allegro/json-avro-converter to convert my json message into an avro file. After calling the convertToAvro method I get a byte array: byte[] byteArrayJson. Then I am using the commons library from Apache:

FileUtils.writeByteArrayToFile(myFile.avro, byteArrayJson);

The file is created. When I try to reconvert it to json, using:

java -jar avro-tools-1.8.1.jar tojson myFile.avro > testCheck.json


Exception in thread "main" java.io.IOException: Not a data file.
    at 
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
    at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
    at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:71)
    at org.apache.avro.tool.Main.run(Main.java:87)
    at org.apache.avro.tool.Main.main(Main.java:76)

I have created a Junit test and used convertToJson method (from the previous link) and assert the strings and it is everything ok. But with the jar it is not working. Am I doing something wrong? I am using the cmd, not powerShell, because I saw in a SO post that this can change the encoding. I think that the problem is with encoding, but I have no idea where to look. (I am using windows as OS)


Solution

  • The reason is that the avro file do not contain same data when produced from these 2 different ways and this is expected behavior.

    As a test, use this command to generate the avro file

    java -jar avro-tools-1.8.2.jar fromjson  --schema-file avroschema.json
    testCheck.json > myFile2.auro
    

    Now read this and print in Java, and notice that it doesnt contain ONLY AVRO RECORD It contains the scme as well ( at least ) -see the String converted data below. This means the data in AVRO files is different when generated using acro tools and when using avro converter

    bjavro.schemaœ{"type":"record","name":"Acme","fields":[{"name":"username","type":"string"}]}avro.c
    

    The validation within tools API "fails" when you try to read an avro file generated from converter with tojson command.

    Now the correct command to use to read the "json" using avro tools when the file is generated using converter is fragtojson. See that we are really reading only JSON fragment ( an avro record here )

    java -jar avro-tools-1.8.2.jar fragtojson --schema-file avroschema.json myFile.avro > myFile21.json
    

    Another thought here is avoid using AVRO tools altogether and create your own executable jar with converter as dependency, and use it read AVRO JSON records.