Search code examples
avrospark-avro

Avro multiple record of same type in single schema


I like to use the same record type in an Avro schema multiple times. Consider this schema definition

{
    "type": "record",
    "name": "OrderBook",
    "namespace": "my.types",
    "doc": "Test order update",
    "fields": [
        {
            "name": "bids",
            "type": {
                "type": "array",
                "items": {
                    "type": "record",
                    "name": "OrderBookVolume",
                    "namespace": "my.types",
                    "fields": [
                        {
                            "name": "price",
                            "type": "double"
                        },
                        {
                            "name": "volume",
                            "type": "double"
                        }
                    ]
                }
            }
        },
        {
            "name": "asks",
            "type": {
                "type": "array",
                "items": {
                    "type": "record",
                    "name": "OrderBookVolume",
                    "namespace": "my.types",
                    "fields": [
                        {
                            "name": "price",
                            "type": "double"
                        },
                        {
                            "name": "volume",
                            "type": "double"
                        }
                    ]
                }
            }
        }
    ]
}

This is not a valid Avro schema and the Avro schema parser fails with

org.apache.avro.SchemaParseException: Can't redefine: my.types.OrderBookVolume

I can fix this by making the type unique by moving the OrderBookVolume into two different namespaces:

{
    "type": "record",
    "name": "OrderBook",
    "namespace": "my.types",
    "doc": "Test order update",
    "fields": [
        {
            "name": "bids",
            "type": {
                "type": "array",
                "items": {
                    "type": "record",
                    "name": "OrderBookVolume",
                    "namespace": "my.types.bid",
                    "fields": [
                        {
                            "name": "price",
                            "type": "double"
                        },
                        {
                            "name": "volume",
                            "type": "double"
                        }
                    ]
                }
            }
        },
        {
            "name": "asks",
            "type": {
                "type": "array",
                "items": {
                    "type": "record",
                    "name": "OrderBookVolume",
                    "namespace": "my.types.ask",
                    "fields": [
                        {
                            "name": "price",
                            "type": "double"
                        },
                        {
                            "name": "volume",
                            "type": "double"
                        }
                    ]
                }
            }
        }
    ]
}

This is not a valid solution as the Avro code generation would generate two different classes, which is very annoying if I like to use the type also for other things and not just for deser and ser.

This problem is related to this issue here: Avro Spark issue #73

Which added differentiation of nested records with the same name by prepending the namespace with the outer record names. Their use case may be purely storage related so it may work for them but not for us.

Does anybody know a better solution? Is this a hard limitation of Avro?


Solution

  • It's not well documented, but Avro allows you to reference previously defined names by using the full namespace for the name that is being referenced. In your case, the following code would result in only one class being generated, referenced by each array. It also DRYs up the schema nicely.

    {
        "type": "record",
        "name": "OrderBook",
        "namespace": "my.types",
        "doc": "Test order update",
        "fields": [
            {
                "name": "bids",
                "type": {
                    "type": "array",
                    "items": {
                        "type": "record",
                        "name": "OrderBookVolume",
                        "namespace": "my.types.bid",
                        "fields": [
                            {
                                "name": "price",
                                "type": "double"
                            },
                            {
                                "name": "volume",
                                "type": "double"
                            }
                        ]
                    }
                }
            },
            {
                "name": "asks",
                "type": {
                    "type": "array",
                    "items": "my.types.bid.OrderBookVolume"
                }
            }
        ]
    }