Search code examples
javaserializationnosqlcassandrahector

Store and retrieve a float[] to/from Cassandra using Hector


I have the following Cassandra schema:

ColumnFamily: FloatArrays {
    SCKey: SuperColumn Key (Integer) {
        Key: FloatArray (float[]) {
            field (String): value (String)
        }
    }
}

In order to insert data that adheres to this schema I created the following template in Hector:

template = new ThriftSuperCfTemplate<Integer, FloatArray, String>(
    keyspace, "FloatArrays", IntegerSerializer.get(),
    FloatArraySerializer.get(), StringSerializer.get());

To (de-)serialize the FloatArray I created (and unit tested) a custom Serializer:

public class FloatArraySerializer extends AbstractSerializer<FloatArray> {

    private static final FloatArraySerializer instance = 
        new FloatArraySerializer();

    public static FloatArraySerializer get() {
        return instance;
    }

    @Override
    public FloatArray fromByteBuffer(ByteBuffer buffer) {
        buffer.rewind();
        FloatBuffer floatBuf = buffer.asFloatBuffer();
        float[] floats = new float[floatBuf.limit()];
        if (floatBuf.hasArray()) {
            floats = floatBuf.array(); 
        } else {
            floatBuf.get(floats, 0, floatBuf.limit());
        }
        return new FloatArray(floats);
    }

    @Override
    public ByteBuffer toByteBuffer(FloatArray theArray) {
        float[] floats = theArray.getFloats();
        ByteBuffer byteBuf = ByteBuffer.allocate(4 * descriptor.length);
        FloatBuffer floatBuf = byteBuf.asFloatBuffer();
        floatBuf.put(floats);
        byteBuf.rewind();
        return byteBuf;
    }

}

Now comes the tricky bit. Storing and then retrieving an array of floats does not return the same result. In fact, the number of elements in the array isn't even the same. The code I use to retrieve the result is shown below:

SuperCfResult<Integer, FloatArray, String> result = 
    template.querySuperColumns(hash);
for (FloatArray floatArray: result.getSuperColumns()) {
    // Do something with the FloatArrays
}

Do I make a conceptual mistake here since I'm quite new to Cassandra/Hector? Right now I don't even have a clue on where it goes wrong. The Serializer seems to be ok. Can you please provide me with some pointers to continue my search? Many thanks!


Solution

  • I think you're on the right track. When I work with ByteBuffers I find I sometimes need the statement:

    import org.apache.thrift.TBaseHelper;
    
           ... 
    
    ByteBuffer aCorrectedByteBuffer = TBaseHelper.rightSize(theByteBufferIWasGiven);
    

    The byte buffer sometimes has its value stored as an offset into its buffer but the Serializers seem to assume that the byte buffer's value starts at offset 0. The TBaseHelper corrects the offsets as best I can tell so the assumptions in the Serializer implementation are made valid.

    The difference in lengths of the array in and array out are the result of starting at the wrong offset. The first byte or two of the serialized value contain the length of the array.