Search code examples
javaignitegridgain

IgniteDataStreamer with allowOverwrite is slower than putAll?


I've wrote some benchmarks on data upload. I expect IgniteDataStreamer be faster (or equal) than putAll(...) and it is, but except that case:

  • number of server nodes: 5
  • cache backups: 1
  • write synchronization mode: FULL_SYNC
  • data streamer allow overwrite: true
  • default rest of data streamer's settings

Results are:

putAll(...) speed: 126630 per sec

data streamer speed: 30430 per sec

In case of not allowed overwrites OR 0 backups+PRIMARY_SYNC data streamer is faster than put all (about 2 times) as expected.

So it turns, Ignite's performance hint to use data streamer breaks? What is possible reasons and how to avoid data streamer's slowdown?

Benchmark code snippet:

for (int i = 0; i < size; i++) {
    pojoMap.put(String.valueOf(i), pojo);
}    
cache.putAll(pojoMap);

vs

igniteDataStreamer.allowOverwrite(false);
for (int i = 0; i < size; i++) {
    igniteDataStreamer.addData(String.valueOf(i), pojo);
}
igniteDataStreamer.flush();

Gridgain CE 8.7.6


Solution

  • It can be easily explained. If allowOverwrite is true, then the data streamer will be sending data via individual cache.put methods. This approach is much slower than standard cache.putAll. Not sure why the data streamer can't use putAll in this scenario at least for atomic caches (individual cache.puts makes sense for transactional cache to avoid deadlocks). I will check possible optimizations with the community.

    As for allowOverwrite equal to false, the streamer reaches out all the nodes that store primary and backup copies directly and does updates in places. For your cluster, it should result in just 5 network requests (from the app to each node) if all the data fits in 5 batches.

    Overall, use allowOverwrite=false for the initial data loading. As for allowOverwrite=true, the community will see if there is any room for an internal optimization at least for atomic caches.