I'm attempting to write an avro file from python, for the most part following the official tutorial.
I have what appears to be a valid schema:
{"namespace": "example.avro",
"type": "record",
"name": "Stock",
"fields": [
{"name": "ticker_symbol", "type": "string"},
{"name": "sector", "type": "string"},
{"name": "change", "type": "float"},
{"name": "price", "type": "float"}
]
}
Here is the relevant code
avro_schema = schema.parse(open("stock.avsc", "rb").read())
output = BytesIO()
writer = DataFileWriter(output, DatumWriter(), avro_schema)
for i in range(1000):
writer.append(_generate_fake_data())
writer.flush()
with open('record.avro', 'wb') as f:
f.write(output.getvalue())
However, when I try to read the output from this file using the cli avro-tools:
avro-tools fragtojson --schema-file stock.avsc ./record.avro --no-pretty
I get the following error:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/usr/local/Cellar/avro-tools/1.8.2/libexec/avro-tools-1.8.2.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -40
at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:336)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
at org.apache.avro.tool.BinaryFragmentToJsonTool.run(BinaryFragmentToJsonTool.java:82)
at org.apache.avro.tool.Main.run(Main.java:87)
at org.apache.avro.tool.Main.main(Main.java:76)
I'm pretty sure the relevant error is
Malformed data. Length is negative: -40
But I can't tell what I'm doing wrong. My suspicion is that I'm writing the avro file incorrectly.
I want to write to a bytes array (instead of directly to a file like in the example) because ultimately I'm going to ship this avro buffer off to AWS Kinesis Firehose using boto3
.
I was using the wrong tool to read the file. I should have used
avro-tools tojson ./record.avro
instead of fragtojson
as in the question. The difference is that fragtojson
is used for a single avro datum, whereas tojson
is used for an entire file.