Search code examples
apache-kafkaapache-stormavro

Deserialize Avro Data In Memory Using Python


We are working on connecting Storm with Kafka.

In our setup Kafka stores messages in Avro.

We are using a Storm wrapper called "Pyleus", and Avro coming in bolt as a variable.

Question: How to deserialize Avro data in a variable using any of the Python-Avro modules out there? There are tons of examples for deserializing Avro in .avro files directly. However, our use-case have a performance requirement so we cannot first write to a file then parse.

Any help, documentation and/or example will be appreciated.


Solution

  • Assuming you have loaded your schema into 'schema' and you have the avro data into 'raw_bytes'. The below might help

    bytes_reader = io.BytesIO(raw_bytes)
    decoder = avro.io.BinaryDecoder(bytes_reader)
    reader = avro.io.DatumReader(schema)
    decoded_data = reader.read(decoder)