Search code examples
pythonavrofastavro

Deserialization, fixed data type in Avro


I am new in avro and I have a avro file to deserialize. Some schemas use fixed type of data to store MAC addresses. Below schema is one of those schemas and used in different schemas as a type.

The schema for MAC addresses like below:

{
    "type": "fixed",
    "name": "MacAddress",
    "size": 6
}

I wrote the first record of the data to a text file using:

from avro.datafile import DataFileReader
from avro.io import DatumReader

reader = DataFileReader(open("data.avro", "rb"), DatumReader())
count = 0
for record in reader:
    if count == 0:
        with open('first_record.txt', 'w') as first_record:
            first_record.write(str(record))
    elif count > 0: break
    count = count + 1
reader.close()

The above mentioned MAC addresses appears in the deserialized data like:

"MacAddress":"b""\\x36\\xe9\\xad\\x64\\x2d\\x3d",

I know that \x means the following is a hexadecimal value. So this is suppose to be "36:e9:ad:64:2d:3d", right? Are "b""" style values the expected output for fixed types?

Also, some values are like below:

"Addr":"b""j\\x26\\xb7\\xda\\x1d\\xf6"

"Addr":"b""\\x28\\xcb\\xc5v\\x14%" 

How come these are MAC addresses? What does j, % characters means?


Solution

  • Are "b""" style values the expected output for fixed types?

    Yes, since fixed types represent bytes and on Python a string of bytes is represented with a prepended b before thing string. It looks like you have a lot of extra quotes in there and I'm guessing that's because you are doing things like str(record) which is probably causing the extra backslashes and quote characters. For example:

    
    >>> str(b"\xae")
    "b'\\xae'"
    

    How come these are MAC addresses? What does j, % characters means?

    Are you sure these are the same record type? The key is Addr instead of MacAddress so it seems like it might be a different record type and schema.