Search code examples
cassandradatastax-java-driver

Java Cassandra Datastax driver custom codec not handling collection types


I have defined a custom codec taking tutorial as a reference- https://docs.datastax.com/en/latest-java-driver/java-driver/reference/customCodecs.html. Here is my codec impelementation: -

    package org.questqa.server.database.codecs;

import java.io.ByteArrayInputStream;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;

import org.questqa.server.database.fetchers.UserInfoFetcher;
import org.questqa.server.entity.User;
import org.springframework.beans.factory.annotation.Autowired;

import com.datastax.driver.core.DataType;
import com.datastax.driver.core.ProtocolVersion;
import com.datastax.driver.core.TypeCodec;
import com.datastax.driver.core.exceptions.InvalidTypeException;

public class UserCodec extends TypeCodec<User>
{

    @Autowired
    UserInfoFetcher userFetcher;

    public UserCodec(DataType cqlType, Class<User> javaClass) 
    {
        super(cqlType, javaClass);
    }

    @Override
    public User deserialize(ByteBuffer buffer, ProtocolVersion arg1) `throws InvalidTypeException {`
        // TODO Auto-generated method stub
        System.out.println("Executing deserialize");
        User user = new User();
        try {
            System.out.println("Size of ByteBuffer: " + `buffer.array().length);`
            String userId = new String(buffer.array(), "UTF-8");
            System.out.println("User id to set: " + userId);
            System.out.println("Setting user id...");
            user.setUserId(userId);
            System.out.println("User id set: " + user.getUserId());
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return user;
    }
    @Override
    public String format(User user) throws InvalidTypeException 
    {
        System.out.println("Executing format");
        return user.getUserId();
    }
    @Override
    public User parse(String userId) throws InvalidTypeException 
    {
        // TODO Auto-generated method stub
        System.out.println("Executing parse");
        User user = new User();
        user.setUserId(userId);
        return user;
    }
    @Override
    public ByteBuffer serialize(User user, ProtocolVersion arg1) throws `InvalidTypeException {`
        // TODO Auto-generated method stub
        System.out.println("Executing serialize");
        ByteBuffer buffer = null;
        System.out.println("Userid to serialize: " + user.getUserId());
        try {
            buffer = ByteBuffer.wrap(user.getUserId().getBytes("UTF-8"));
            String bufferStr = new String(buffer.array(), "UTF-8");
            System.out.println("ByteBuffer to be returned: \nLength: " + `buffer.array().length +` 
                    "\nValue: " + bufferStr);
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        return buffer;
    }
}

Basically, in serialize method, I'm trying to extract a userId string (using user.getUserId()), and then store it in my cassandra table as a userid string. In deserialize, I simply convert the given ByteBuffer instance to a string and return it. This works perfectly fine for string, set and list when serializing (i.e., if I have a Set, the codec executes serialize() on each individual element of the set and stores the corresponding string in my casandra table). However, during deserializing a set field in cassandra, the deserialize method sends the entire set in the ByteBuffer object. I can see that the individual elements are there by analyzing the ByteBuffer object. Note that deserialize is executed on every element of the set. So, if the set has 5 elements, deserialize is executed 5 times, but each time, instead of sending the particular element in the ByteBuffer, it sends the entire set. So, I don't understand why this is doing and google hasn't been of much help. Any suggestions/solutions are most welcome as I've been stuck on this for quite a while now.

Thanks!


Solution

  • AbstractCollectionCodec is what SetCodec and ListCodec use to serialize/deserialize collections. On deserialize, they pass the whole ByteBuffer that is positioned where the element begins (ByteBuffer.position()) and limited by the end of the individual element (ByteBuffer.limit()) to the underlying element codec (in this case, UserCodec).

    When you call buffer.array() you are getting the entire backing array of the ByteBuffer, regardless of the position or limit.

    What should work better for you is to use Bytes.getArray(ByteBuffer) which returns you a byte array for the data between limit and position, i.e.:

    @Override
    public User deserialize(ByteBuffer buffer, ProtocolVersion arg1) `throws InvalidTypeException {`
        // TODO Auto-generated method stub
        System.out.println("Executing deserialize");
        User user = new User();
        try {
            byte[] data = Bytes.toArray(buffer);
            System.out.println("Size of ByteBuffer: " + data.length);
            String userId = new String(data, "UTF-8");
            System.out.println("User id to set: " + userId);
            System.out.println("Setting user id...");
            user.setUserId(userId);
            System.out.println("User id set: " + user.getUserId());
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return user;
    }
    

    Since you are simply storing your User instance in C* as a string representing its getUserId(), what you could do instead is create an instance of MappingCodec from the extras module that simply maps the User <-> String (UserId), i.e.:

    import com.datastax.driver.core.TypeCodec;
    import com.datastax.driver.extras.codecs.MappingCodec;
    
    public class UserCodec extends MappingCodec<User, String> {
    
        public UserCodec() {
            super(TypeCodec.varchar(), User.class);
        }
    
        @Override
        protected User deserialize(String userId) {
            return new User(userId);
        }
    
        @Override
        protected String serialize(User user) {
            return user.getUserId();
        }
    }
    

    You can read more about MappingCodec on the docs site.