Search code examples
javaserializationhazelcast

Hazelcast Java Serialization/Deserialization ArrayList Pitfall


I've switched from Memcached to Hazelcast. After a while i've noticed that the size of the Cache was bigger than usual. With man center.

So i did like this: 1. Before to call the IMap.set(key,value(ArrayList) i deserialize the value to a file which has 128K as size. 2. After the IMap.set() is called, i IMap.get() the same map, which suddently this has now 6 MB size.

The object in question has many objects which are referenced multiple times in the same structure.

i've opened the 2 binary files and i've seen that the 6MB file has a lot of duplicated data. The serialization used by hazelcast somehow make copies of the references

  • All the Classes instantiated for the Cache are Serializable except Enums.

  • using Memcached the value size is 128K in both cases.

  • i've tryied Kryo with hazelcast and there was not really a difference, still over 6MB

Have someone a similar problem with hazelcast ? If yes how did you solved this without changing the cache provider.

I could provide the Object Structure and Try to reproduce it with non sensitive data, if someone need it.


Solution

  • I am not pretending, but after a lost day, i finally came up with a solution which workaround this. I cannot say if it is a feature or just a problem to report.

    Anyway in Hazelcast if you put in an IMap a value as ArrayList thus will be Serialized Entry By Entry. Which means if we have 100 entries of the same instance A which is 6K we will have 600K with Hazelcast. Here a short RAW code which prove my answer.

    To Workaround or avoid this with Java Serialization you should wrap the ArrayList into an object , this will do the trick.

    (only with Serializable, no other Implementations)

       @Test
    public void start() throws Exception {
    
    
    
        HazelcastInstance client = produceHazelcastClient();
    
        Data data = new Data();
    
        ArrayList<Data> datas = new ArrayList<>();
    
        IntStream.range(0, 1000).forEach(i -> {
            datas.add(data);
        });
    
        wirteFile(datas,"DataLeoBefore","1");
    
    
        client.getMap("data").put("LEO", datas);
    
        Object redeserialized = client.getMap("data").get("LEO");
    
        wirteFile(redeserialized,"DataLeoAfter","1");
    
    }
    
    public void wirteFile(Object value, String key, String fileName) {
        try {
            Files.write(Paths.get("./" + fileName + "_" + key), SerializationUtils.serialize(((ArrayList) value)));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }