Search code examples
avro

How can I remove the redundant nested array in my generated Avro message?


I have an Avro schema, part of which looks like this:

{
        "name": "products",
        "type": [
          "null",
          {
            "type": "array",
            "items": {
              "name": "products_record",
              "type": "record",
              "fields": [
                {
                  "name": "id",
                  "type": {
                    "type": "string",
                    "logicalType": "uuid"
                  }
                },
                {
                  "name": "status",
                  "type": "string"
                },
                {
                  "name": "error",
                  "type": [
                    "string",
                    "null"
                  ]
                }
              ]
            }
          }
        ],
        "default": null
      }

My intention is to end up with a message that has a products key, where the value is an array of items, each of which has a id, status, and potentially an error.

What I end up with instead is a message that has a products key, but the value is an object that has a key of array, and then that key has the array value I'm expecting.

E.g.

{
  "products": {
    "array": [
      {
        "id": "100",
        "status": "VALID",
        "error": null
      },
      {
        "error": null,
        "id": "200",
        "status": "VALID"
      }
    ]
  }
}

Is there a way to change my schema so that I can get rid of the intermediate array key, and the messages that are encoded with it look like this instead:

{
  "products": [
      {
        "id": "100",
        "status": "VALID",
        "error": null
      },
      {
        "error": null,
        "id": "200",
        "status": "VALID"
      }
   ]
}


Solution

  • change my schema so that I can get rid of the intermediate array key

    Don't make it a union type / nullable.

    {
        "name": "products",
        "type": {
          "type": "array",
          ...
    

    You can use an empty list to infer no data and set it back to null during deserialization, if needed.

    See spec that states the field type becomes the key - https://avro.apache.org/docs/current/spec.html#json_encoding