java serialization hadoop proxy-classes writable

Extending 3rd party class as Hadoop Writable with proxy

I have a third-party class that I am trying to use in Hadoop, and thus need to make have it implement Writable. The problem is that the way Hadoop uses Writable is to create an object o = SomeObject(), then call o.readFields(in) to de-serialize, and in my situation I cannot create the empty object:

public abstract class Cube {
    protected final int size;
    protected Cube(int size) { this.size = size; }
}

Note size is final.

public class RealCube {
    public Cube(int size) { super(size); }
}

Here RealCube only has one super constructor to call, and that construtor sets the final variable in the abstract super class.

public class RealCubeWritable implements Writable {
    public void readFields(DataInput in) {
        /* yikes! need to set the size */
    }
}

When we get down to trying to implement RealCubeWritable, I cannot have a RealCubeWritable() constructor, and I cannot know the actual size until the DataInput stream is examined.

So it seems like the only way to do this in Hadoop is to use a wrapper. What I am wondering is if there is a way to use a wrapper, but have RealCubeWritable still behave like RealCube? I've looked into using Dynamic Proxy classes, but I'm not sure if this will work (or how to actually do it).

Thanks!

Solution

If you genuinely have no control over the Cube object then i'm not sure you have many (pleasant) options:

I'm not sure i understand what you mean by a wrapper or proxy object - either way final is final so you'd need to create a copy of the class without the final flags
You might be able to use a nasty reflection hack to allow you to un-final the size field, and then set the field value also through reflection, but that may cause some undefined behaviour if Cube initialised other variables from size in the constructor
You could write your own Serialization class, which will allow you to create a new instance of RealCube (not the most efficient, but it will work) for each object (rather than utilizing traditional hadoop object reuse)
Is the domain of size relatively small? (i.e. it can only be a limited set / range of values). If so you could create an instance of RealCube for each valid size value, and again, using a custom Serialization implementation, pick the right Cube instance based upon the size read from the input stream