Search code examples
pythonavrowriter

Python Avro writer.append doesn't work when a json string is passed as a variable.


Avro Schema file: user.avsc

{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "TransportProtocol", "type": "string"}
 ]
}

Pasting my code snippet that works:-

import json
from avro import schema, datafile, io
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

schema = avro.schema.parse(open("user.avsc").read())
writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"TransportProtocol": "udp"})
writer.close()

Pasting my code snippet that doesn't work:-

dummy_json = '{"TransportProtocol": "udp"}'
schema = avro.schema.parse(open("user.avsc").read())
writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append(dummy_json)
writer.close()

When I pass the json string as it is in the append function, it words and I get the desired avro output. But if I initialize the json string to a variable and then try to pass that variable in the append function, it doesn't work and throws an error:-

avro.io.AvroTypeException: The datum {"TransportProtocol": "udp"} is not an example of the schema {

Any help?Thanks


Solution

  • I think that might be due to the fact that in your first example you actually pass a dictionary {"TransportProtocol": "udp"}, not a string. But in the second one, you pass a string '{"TransportProtocol": "udp"}'.

    Check this out (http://avro.apache.org/docs/1.7.6/gettingstartedpython.html):

    We use DataFileWriter.append to add items to our data file. Avro records are represented as Python dicts.

    So basically, you can't pass string as a parameter.