Search code examples

How does Apache AVRO serialize (large) data-structures

I'm looking at using AVRO on hadoop. But I am concerned with serialization of large data-structures and how to add methods to the (data-) classes.

The example (taken from shows a model of facebook users.

  "namespace": "test.avro",
  "name": "FacebookUser",
  "type": "record",
  "fields": [
      {"name": "name", "type": "string"},
      {"name": "friends", "type": "array", "items": "FacebookUser"} 

Does avro serialize the complete social graph of a facebookuser in this model?

[That is, if I want to serialize one user, does the serialization include all it's friends and their friends and so on?]

If the answer is yes, I'd rather store ID's of friends instead of references, to look up in my application whenever needed. In that case I would like to be able to add a method that returns the actual friends instead of ID's.

How can I wrap/extend generated AVRO java classes to add methods?

(also to add methods that return for example friend-count)


  • Regarding the second question: How can I wrap/extend generated AVRO java classes to add methods?

    You can use the AspectJ to inject new methods into an existing/generated class. AspectJ is required only at compile-time. Approach is illustrated below.

    Define a Person record as Avro IDL (person.avdl):

    protocol PersonProtocol {
        record Person {
            string firstName;
            string lastName;

    use maven and the avro-maven-plugin to generate java sources from the AVDL:


    Above configuration presumes that the person.avid file is in src/main/resources/avro. Sources are generated in target/generated-sources/java.

    Generated has two methods: getFirstName() and getLastName(). If you want to extend it with another method: getCompleteName() = firstName + lastName then you can inject this method with the following aspect:

    package net.tzolov.avro.extend;
    import net.tzolov.avro.extend.Person;
    public aspect PersonAspect {
        public String Person.getCompleteName() {        
            return this.getFirstName() + " " + this.getLastName();

    Use the aspectj-maven-plugin maven plugin to weave this aspect with the generated code


    and the result:

    public void testPersonCompleteName() throws Exception {
        Person person = Person.newBuilder()
        Assert.assertEquals("John Atanasoff", person.getCompleteName());