Search code examples
avro

avro AVSC (json) file array or schema re-use


This is related to a question that's been asked myriad times but I can't find a clear answer.

In AVSC (json) syntax for defining avro schemas, there is no 'import' capability. So it seems there's no clear way to define a schema and reference it elsewhere. (I recognize AVDL supports import but the java parser doesn't yet allow for uuid types, though that is patched and will be fixed in 1.11)

I see a lot of answers to 'how to re-use schemas' that rely on using the avro maven plugin to define 'includes' which is great if you're using java, but I'm working in a polyglot environment.

I have toyed around with this syntax in an AVSC file which is working for me in maven/java but seems entirely undocumented:

[
  // ^ note: starts with a top-level array
  {
    // schema 1

    "type": "record",
    "namespace": "com.mycompany",
    "name": "Money",
    "fields": [
      {
        "name": "amount",
        "type": {
          "type": "bytes",
          "logicalType": "decimal",
          "scale": 2,
          "precision": 19
        }
      },
      {
        "name": "currency",
        "type": "string",
        "doc": "3-character ISO 4217 currency code"
      }
    ]
  },
  {
    // schema 2, references schema 1
    
    "type": "record",
    "namespace": "com.mycompany.budgeting"
    "name": "BudgetsModified",
    "fields": [
      {
        "name": "id",
        "type": {
          "type": "string",
          "logicalType": "uuid"
        }
      },
      {
        "name": "amount",
        // re-use
        "type": "com.mycompany.Money"
      }
    ]
  }
]

but is this actually supported or just a quirk of the maven avro plugin?

The problems I'm specifically looking to address:

  • must support java, python, and (ideally) typescript (so a maven-only solution is no good)
  • must support schema referencing/re-use -- I don't want to redefine a custom tuple like Money (a decimal with a currency) in every schema that requires it
  • must interoperate with Confluent Schema Registry

Solution

  • but is this actually supported or just a quirk of the maven avro plugin?

    This is a completely valid way of combining/referencing schemas. In fact, in the python fastavro library there is a load_schema API that originally would do basically just that; it would load all the schemas into a list (Avro Union) because that was a correct and easy way to solve the problem.

    As for must interoperate with Confluent Schema Registry, I don't know how the schema registry works and if it supports this type of unioned schema, but hopefully it should because the schema is valid.