I'm looking at using AVRO on hadoop. But I am concerned with serialization of large data-structures and how to add methods to the (data-) classes.
The example (taken from http://blog.voidsearch.com/bigdata/apache-avro-in-practice/) shows a model of facebook users.
{
"namespace": "test.avro",
"name": "FacebookUser",
"type": "record",
"fields": [
{"name": "name", "type": "string"},
...,
{"name": "friends", "type": "array", "items": "FacebookUser"}
]
}
Does avro serialize the complete social graph of a facebookuser in this model?
[That is, if I want to serialize one user, does the serialization include all it's friends and their friends and so on?]
If the answer is yes, I'd rather store ID's of friends instead of references, to look up in my application whenever needed. In that case I would like to be able to add a method that returns the actual friends instead of ID's.
How can I wrap/extend generated AVRO java classes to add methods?
(also to add methods that return for example friend-count)
Regarding the second question: How can I wrap/extend generated AVRO java classes to add methods?
You can use the AspectJ to inject new methods into an existing/generated class. AspectJ is required only at compile-time. Approach is illustrated below.
Define a Person record as Avro IDL (person.avdl):
@namespace("net.tzolov.avro.extend")
protocol PersonProtocol {
record Person {
string firstName;
string lastName;
}
}
use maven and the avro-maven-plugin to generate java sources from the AVDL:
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.6.3</version>
</dependency>
......
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>1.6.3</version>
<executions>
<execution>
<id>generate-avro-sources</id>
<phase>generate-sources</phase>
<goals>
<goal>idl-protocol</goal>
</goals>
<configuration>
<sourceDirectory>src/main/resources/avro</sourceDirectory>
<outputDirectory>${project.build.directory}/generated-sources/java</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
Above configuration presumes that the person.avid file is in src/main/resources/avro. Sources are generated in target/generated-sources/java.
Generated Person.java has two methods: getFirstName() and getLastName(). If you want to extend it with another method: getCompleteName() = firstName + lastName then you can inject this method with the following aspect:
package net.tzolov.avro.extend;
import net.tzolov.avro.extend.Person;
public aspect PersonAspect {
public String Person.getCompleteName() {
return this.getFirstName() + " " + this.getLastName();
}
}
Use the aspectj-maven-plugin maven plugin to weave this aspect with the generated code
<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjrt</artifactId>
<version>1.6.12</version>
</dependency>
<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjweaver</artifactId>
<version>1.6.12</version>
</dependency>
....
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>aspectj-maven-plugin</artifactId>
<version>1.2</version>
<dependencies>
<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjrt</artifactId>
<version>1.6.12</version>
</dependency>
<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjtools</artifactId>
<version>1.6.12</version>
</dependency>
</dependencies>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>test-compile</goal>
</goals>
</execution>
</executions>
<configuration>
<source>6</source>
<target>6</target>
</configuration>
</plugin>
and the result:
@Test
public void testPersonCompleteName() throws Exception {
Person person = Person.newBuilder()
.setFirstName("John").setLastName("Atanasoff").build();
Assert.assertEquals("John Atanasoff", person.getCompleteName());
}