Search code examples
serializationavro

Why is the avro default value not used ? (with avro-python)


I'm serializing some data with Avro (using the python library), and I have a hard time figuring how to make the "default" value work.

I have this schema:

{
    "type": "record",
    "fields":[
        {"name": "amount", "type": "long"},
        {"name": "currency", "type": "string", "default": "EUR"}
    ],
    "name": "Monetary",
}

So as I understood, I could pass an amount and no currency, and the currency field would take the "EUR" value. However, if I don't pass a "currency" field when writing, I get the error avro.io.AvroTypeException: The datum { ... } is not an example of the schema xxx...

If I replace the currency field's type as an union ["string", "null"], then the data is serialized, but currency is null.

So it seems the "default" value is not taken into account at all.

What am I missing ? Are default value applicable for primitive types ?

Thanks in advance


Solution

  • Here is the relevant cite from avro specification

     default: A default value for this field, used when reading instances that lack this field (optional)
    

    The 'default value' field is used when you try to read an instance written with one schema and convert it to an instance written with another schema. If the field does not exist at the first schema (thus the instance lacks this field), the instance you get will take the default value of the second schema.

    That't it!

    The 'default value' is not used when you read/write instance using the same schema.

    So, for your example, when you set the currency field a default value, if you try to read an instance which was written with older schema which did not contain currency field, the instance you get will contain the default value you've defined at your schema.

    Worth to mention, when you use union, the default value refers only to the first type of the union.