I've wrote some benchmarks on data upload. I expect IgniteDataStreamer
be faster (or equal) than putAll(...)
and it is, but except that case:
Results are:
putAll(...)
speed: 126630 per sec
data streamer
speed: 30430 per sec
In case of not allowed overwrites OR 0 backups+PRIMARY_SYNC data streamer is faster than put all (about 2 times) as expected.
So it turns, Ignite's performance hint to use data streamer breaks? What is possible reasons and how to avoid data streamer's slowdown?
Benchmark code snippet:
for (int i = 0; i < size; i++) {
pojoMap.put(String.valueOf(i), pojo);
}
cache.putAll(pojoMap);
vs
igniteDataStreamer.allowOverwrite(false);
for (int i = 0; i < size; i++) {
igniteDataStreamer.addData(String.valueOf(i), pojo);
}
igniteDataStreamer.flush();
Gridgain CE 8.7.6
It can be easily explained. If allowOverwrite
is true
, then the data streamer will be sending data via individual cache.put methods. This approach is much slower than standard cache.putAll
. Not sure why the data streamer can't use putAll
in this scenario at least for atomic caches (individual cache.puts
makes sense for transactional cache to avoid deadlocks). I will check possible optimizations with the community.
As for allowOverwrite
equal to false
, the streamer reaches out all the nodes that store primary and backup copies directly and does updates in places. For your cluster, it should result in just 5 network requests (from the app to each node) if all the data fits in 5 batches.
Overall, use allowOverwrite=false
for the initial data loading. As for allowOverwrite=true
, the community will see if there is any room for an internal optimization at least for atomic caches.