Search code examples
pythonschemaavro

Avro - Why is there an option to specify the writers schema in DatumReader?


When reading an Avro file, there's an option to specify the Reader's schema in lieu of the schema that is already embedded with the file.

reader = DataFileReader(data, DatumReader(readers_schema=readers_schema))

What I'm confused about is that there's also the option to specify the Writer's schema, i.e.

reader = DataFileReader(data, DatumReader(writers_schema=writers_schema, readers_schema=readers_schema))

Why would this ever be necessary if the writers schema is already embedded with the file? And if the embedded schema is different from the passed in writers schema, what kind of behavior would we see?


Solution

  • If you mean the reference python implementation, the DatumReader's writer's schema (if any) gets overwritten with the one embedded in the file. It looks like there is a TODO to use it to specify the expected schema, but this doesn't seem implemented yet.

    Specifying a writer's schema with a DatumReader can be useful in general though (e.g. if you are reading binary data outside of container files serialized with a different schema).