Search code examples
avroavro-tools

Is this avro message valid?


I have some example Avro messages from a kafka provider that looks to start as such:

00000000  4f 62 6a 01 04 16 61 76  72 6f 2e 73 63 68 65 6d  |Obj...avro.schem|
00000010  61 ef bf bd 24 7b 22 74  79 70 65 22 3a 22 72 65  |a...${"type":"re|

That ef bf bd 24 I expected to be the length of the schema which is 2332 bytes. I'm having trouble confirming that the zigzag varint (why would a length, which can never be negative, be zigzaged?) is the right value. I take it to be somewhere in the 200K range.

I believe that's why I'm having trouble using the avro-tools jar on it at all to either getmeta, getschema or transform to json.

Is this a particular known issue with either the version of Avro Tools which is 1.8.2 or the platform Mac OS with java 1.8.0_102-b14 for that tool version?

Does this look like it's been mis-encoded? Because all calls to use the tools give me:

$ java -jar ~/Downloads/avro-tools-1.8.2.jar tojson dt20170607hr08_1496793109907_11_8229967.bin.1
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.IllegalArgumentException
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
    at org.apache.avro.io.BinaryDecoder.readBytes(BinaryDecoder.java:288)
    at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:112)
    at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
    at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:71)
    at org.apache.avro.tool.Main.run(Main.java:87)
    at org.apache.avro.tool.Main.main(Main.java:76)

Solution

  • Looks like you have a single record in the Avro file. The system generating the Avro file is running the older version. I have a similar issue with Nifi running 1.7.7. By merging two records into the Avro file, we were able to work around the issue.

    Avro 1.8.2 fixes the bug.

    1.7.7 and 1.8.0/1.8.1 all have the single record issue.

    https://issues.apache.org/jira/browse/AVRO-1888