I'm encountering a puzzling issue while working with Avro schema validation and message testing, specifically within a Google Cloud Pub/Sub schema. In my schema, I've defined a list of custom objects that can be nullable.
However, when I try to validate a sample JSON message like this:
{
"nullable_list": [
{"field1": 10, "field2": "example1"},
{"field1": 20, "field2": "example2"}
]
}
I'm getting the error message: "The message is not valid according to the schema."
Surprisingly, if I provide a JSON message like this:
{
"nullable_list": null
}
It indicates that: "The message is valid according to the schema."
Here's my schema:
{
"type": "record",
"name": "MyRecord",
"fields": [
{
"name": "nullable_list",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "MyObject",
"fields": [
{"name": "field1", "type": "int"},
{"name": "field2", "type": "string"}
]
}
}
],
"default": null
}
]
}
Can someone please explain why this is happening?
From the Avro JSON Encoding specification:
The value of a union is encoded in JSON as follows:
- if its type is null, then it is encoded as a JSON null;
- otherwise it is encoded as a JSON object with one name/value pair whose name is the type’s name and whose value is the recursively encoded value. For Avro’s named types (record, fixed or enum) the user-specified name is used, for other types the type name is used.
Therefore, the proper JSON representation of this message would be:
{
"nullable_list": {
"array": [
{
"field1": 10,
"field2": "example1"
},
{
"field1": 20,
"field2": "example2"
}
]
}
}