Search code examples
avro

Avro Schema array without name


Question 1

I'm wondering whether below schema is valid or not for an Avro schema. Note that it is missing name in the first object of fields array.

{
  "name": "AgentRecommendationList",
  "type": "record",
  "fields": [
      {
          "type": {
              "type": "array",
              "items": {
                  "name": "friend",
                  "type": "record",
                  "fields": [
                      {
                          "name": "Name",
                          "type": "string"
                      },
                      {
                          "name": "phoneNumber",
                          "type": "string"
                      },
                      {
                          "name": "email",
                          "type": "string"
                      }
                  ]
              }
          }
      }
  ]
}

Which actually designed to target below kind of data

[
        {
            "Name": "1",
            "phoneNumber": "2",
            "email": "3"
        },
        {
            "Name": "1",
            "phoneNumber": "2",
            "email": "3"
        },
        {
            "Name": "1",
            "phoneNumber": "2",
            "email": "3"
        }
 ]

Based on reading below, seems like array without name like this are not permitted

Avro Schema failure

There is no way to define and avro schema with an array without a field name.

https://avro.apache.org/docs/current/spec.html#schema_complex

name: a JSON string providing the name of the field (required), and

I'm suspecting that below is the correct ones

{
  "name": "AgentRecommendationList",
  "type": "record",
  "fields": [
      {
          "name": "friends",
          "type": {
              "type": "array",
              "items": {
                  "name": "friend",
                  "type": "record",
                  "fields": [
                      {
                          "name": "Name",
                          "type": "string"
                      },
                      {
                          "name": "phoneNumber",
                          "type": "string"
                      },
                      {
                          "name": "email",
                          "type": "string"
                      }
                  ]
              }
          }
      }
  ]
}

And it should have a data like below, in order to do the avro conversion successfully

{
  "friends": [
      {
          "Name": "1",
          "phoneNumber": "2",
          "email": "3"
      },
      {
          "Name": "1",
          "phoneNumber": "2",
          "email": "3"
      },
      {
          "Name": "1",
          "phoneNumber": "2",
          "email": "3"
      }
  ]
}

Question 2

Does below schema is a valid schema? This target the array without name in first example...

{
  "name": "AgentRecommendationList",
  "type": "array",
  "items": {
      "name": "friend",
      "type": "record",
      "fields": [
          {
              "name": "Name",
              "type": "string"
          },
          {
              "name": "phoneNumber",
              "type": "string"
          },
          {
              "name": "email",
              "type": "string"
          }
      ]
   }
}

I will appreciate if anyone can confirm my understanding... thanks!


Solution

  • For question 1...

    Everything you have written is right. The first schema, as you mentioned, is not valid because each field within a record needs to have a name. The corrected schema is valid and the corrected data is right for the updated schema.

    For question 2...

    The schema in question two is valid, but the AgentRecommendationList name will get ignored. Arrays don't have names. This might sound strange after looking at the examples in question one, but in those the name is part of the field specification, not the array.