java caching datagrid hazelcast in-memory

The fastest way to populate a In Memory Data Grid Hazelcast

What is the fastest way to populate a Hazelcast Data Grid. Reading through documentation I can see couple of variants:

Use multithreading and IMap.set
Use multithreading and IMap.putAll
Use a Distributed Execution in order to start populating the grid from all participants.

My performance benchmark shows that IMap.putAll is faster than IMap.Set. But it is stated in the Hazelcasty Documentation that IMap.putAll does not come with guarantees that everything will be inserted atomically.

Can someone clarify a little bit about what would be the fastest way to populate a data grid with data ?

Is variant number 3 good ?

Solution

I would see the same three options. Anyhow as you mentioned, option two does not guarantee that everything was put into the map atomically but if you just load data and wait for all threads to finish loading data using IMap::putAll you should be fine.

Apart from that IMap::set would be the alternative. In any case you want to multithread the loading process. I would play around a bit with different thread numbers and loading data from a client is normally recommended to keep nodes free for storage operations.

I personally never benchmarked your third option, anyhow it would be possible as well. Just not sure it is worth the additional work.

How much data do you want to load that you're concerned it could be slow? Do you already know that loading is slow? Do you use Java Serialization (this is a huge performance killer)? Do you use indexes (those have to be generated while putting data)?

There's normally a lot of optimizations to apply to speed up, not only, data loading but also normal operation.