Search code examples
serializationschemadeserializationnestedavro

How to represent repeated fields in Avro schema?


My data model has few fixed fields and a block of variable fields. The variable fields as a block, can repeat o to n number of times within the same record.

The object person can be used as an analogy for this. The name has just one entry in each record but he can have o to n number of addresses, and the field address has a structure too. Is there a way to loop through the address schema for any number of addresses the person has? How do I mention this in the Avro schema file?


Solution

  • Have you tried using a nested Avro schema. That should solve your one-person-multiple-addresses requirement. Here is a schema that would help.

    {
        "type": "record",
        "name" : "person",
        "namespace" : "com.testavro",
        "fields": [
            { "name" : "personname", "type": ["null","string"] },
            { "name" : "personId", "type": ["null","string"] },
            {  "name" : "Addresses", "type": {
                "type": "array",
                "items": [  {
                  "type" : "record",
                  "name" : "Address",
                  "fields" : [
                    { "name" : "addressLine1", "type": ["null", "string"] }, 
                    { "name" : "addressLine2", "type": ["null", "string"] }, 
                    { "name" : "city", "type": ["null", "string"] }, 
                    { "name" : "state", "type": ["null", "string"] }, 
                    { "name" : "zipcode", "type": ["null", "string"] }
                    ]
                }]
                }
            }
        ]
    }
    

    When code is generated with the above avro schema you get the person class and the Address class. The autogenerated class for person class(only field declarations) looks like

     /**
       * RecordBuilder for person instances.
       */
      public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBase<person>
        implements org.apache.avro.data.RecordBuilder<person> {
    
        private java.lang.String personname;
        private java.lang.String personId;
        private java.util.List<java.lang.Object> Addresses;
    

    and the Address class (only field declarations) looks like

      /**
       * RecordBuilder for Address instances.
       */
      public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBase<Address>
        implements org.apache.avro.data.RecordBuilder<Address> {
    
        private java.lang.String addressLine1;
        private java.lang.String addressLine2;
        private java.lang.String city;
        private java.lang.String state;
        private java.lang.String zipcode;
    

    Is this what you were looking for?