Search code examples
pythonbinarydeserializationavroapache-kafka-connect

python - deserialise avro byte logical type decimal to decimal


I am trying to read an an Avro file using the python avro library (python 2). When I use the following code:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter, BinaryDecoder
reader = DataFileReader(open("filename.avro", "rb"), DatumReader())
schema = reader.meta

Then it reads every column correctly, except for one which remains as bytes, rather than the expected decimal values.

How can I convert this column to the expected decimal values? I notice that the file's metadata identifies the column as 'type' : 'bytes', but 'logicalType' :'decimal'

I post below the metadata for this column, as well as the byte values (expected actual values are all multiples of 1,000 less than 25,000. The file was created using Kafka.

Metadata:

 {
                            "name": "amount",
                            "type": {
                                "type": "bytes",
                                "scale": 8,
                                "precision": 20,
                                "connect.version": 1,
                                "connect.parameters": {
                                    "scale": "8",
                                    "connect.decimal.precision": "20"
                                },
                                "connect.name": "org.apache.kafka.connect.data.Decimal",
                                "logicalType": "decimal"
                            }
                        }

Byte values:

'E\xd9d\xb8\x00'
'\x00\xe8\xd4\xa5\x10\x00'
'\x01\x17e\x92\xe0\x00'
'\x01\x17e\x92\xe0\x00'

Expected values:

3,000.00
10,000.00
12,000.00
5,000.00

I need to use this within a Lambda function deployed on AWS, so cannot use fast_avro, or other libraries using C rather than pure Python.

See links below: https://pypi.org/project/fastavro/ https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html


Solution

  • To do this you will need to use the fastavro library. Both the avro and avro-python3 libraries do not support logical types at the time of posting this.