Search code examples
javacode-generationavro

How does Apache AVRO serialize (large) data-structures


I'm looking at using AVRO on hadoop. But I am concerned with serialization of large data-structures and how to add methods to the (data-) classes.

The example (taken from http://blog.voidsearch.com/bigdata/apache-avro-in-practice/) shows a model of facebook users.

{
  "namespace": "test.avro",
  "name": "FacebookUser",
  "type": "record",
  "fields": [
      {"name": "name", "type": "string"},
      ...,
      {"name": "friends", "type": "array", "items": "FacebookUser"} 
  ]
}

Does avro serialize the complete social graph of a facebookuser in this model?

[That is, if I want to serialize one user, does the serialization include all it's friends and their friends and so on?]

If the answer is yes, I'd rather store ID's of friends instead of references, to look up in my application whenever needed. In that case I would like to be able to add a method that returns the actual friends instead of ID's.

How can I wrap/extend generated AVRO java classes to add methods?

(also to add methods that return for example friend-count)


Solution

  • Regarding the second question: How can I wrap/extend generated AVRO java classes to add methods?

    You can use the AspectJ to inject new methods into an existing/generated class. AspectJ is required only at compile-time. Approach is illustrated below.

    Define a Person record as Avro IDL (person.avdl):

    @namespace("net.tzolov.avro.extend")
    protocol PersonProtocol {
        record Person {
            string firstName;
            string lastName;
        }     
    }
    

    use maven and the avro-maven-plugin to generate java sources from the AVDL:

    <dependency>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro</artifactId>
        <version>1.6.3</version>
    </dependency>
        ......
        <plugin>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-maven-plugin</artifactId>
            <version>1.6.3</version>
            <executions>
                <execution>
                    <id>generate-avro-sources</id>
                    <phase>generate-sources</phase>
                    <goals>
                        <goal>idl-protocol</goal>
                    </goals>
                    <configuration>
                        <sourceDirectory>src/main/resources/avro</sourceDirectory>
                        <outputDirectory>${project.build.directory}/generated-sources/java</outputDirectory>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    

    Above configuration presumes that the person.avid file is in src/main/resources/avro. Sources are generated in target/generated-sources/java.

    Generated Person.java has two methods: getFirstName() and getLastName(). If you want to extend it with another method: getCompleteName() = firstName + lastName then you can inject this method with the following aspect:

    package net.tzolov.avro.extend;
    
    import net.tzolov.avro.extend.Person;
    
    public aspect PersonAspect {
    
        public String Person.getCompleteName() {        
            return this.getFirstName() + " " + this.getLastName();
        }
    }
    

    Use the aspectj-maven-plugin maven plugin to weave this aspect with the generated code

    <dependency>
        <groupId>org.aspectj</groupId>
        <artifactId>aspectjrt</artifactId>
        <version>1.6.12</version>
    </dependency>
    <dependency>
        <groupId>org.aspectj</groupId>
        <artifactId>aspectjweaver</artifactId>
        <version>1.6.12</version>
    </dependency>
        ....
    <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>aspectj-maven-plugin</artifactId>
        <version>1.2</version>
        <dependencies>
            <dependency>
                <groupId>org.aspectj</groupId>
                <artifactId>aspectjrt</artifactId>
                <version>1.6.12</version>
            </dependency>
            <dependency>
                <groupId>org.aspectj</groupId>
                <artifactId>aspectjtools</artifactId>
                <version>1.6.12</version>
            </dependency>
        </dependencies>
        <executions>
            <execution>
                <goals>
                    <goal>compile</goal>
                    <goal>test-compile</goal>
                </goals>
            </execution>
        </executions>
        <configuration>
            <source>6</source>
            <target>6</target>
        </configuration>
    </plugin>
    

    and the result:

    @Test
    public void testPersonCompleteName() throws Exception {
    
        Person person = Person.newBuilder()
                .setFirstName("John").setLastName("Atanasoff").build();
    
        Assert.assertEquals("John Atanasoff", person.getCompleteName());
    }