What additional things can I do beyond using the IDataStreamer
and IBinaryObject
to decrease insertion time into Apache Ignite.NET? It is possible to get a significant performance increase or is this as good as it will get?
I'm using:
IBinaryObject
/ WithKeepBinary
IDataStreamer
I used this example as a starting point: https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/examples/Apache.Ignite.Examples/Datagrid/DataStreamerExample.cs
Here's my usage of the IDataStreamer
:
using (var ds = m_ignite.GetDataStreamer<string, IBinaryObject>(CacheName)) {
foreach (var binaryRow in rows.Select(r => BuildRow(r))) {
var key = binaryRow.GetField<string>(PrimaryKeyName);
ds.AddData(key, binaryRow);
}
}
Performance results: (5 nodes all with the same specifications)
BenchmarkDotNet=v0.10.8, OS=Windows 8.1 (6.3.9600)
Processor=Intel Xeon CPU E5-2698 v4 2.20GHz Intel Xeon CPU E5-2698 v4 2.20GHz, ProcessorCount=4
Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.2053.0
Job-UZDKMF : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.2053.0
RunStrategy=Monitoring TargetCount=1
NumRows Mean (ms) Per Row (ms/row)
10 359.50* 35.95*
100 465.50* 4.66*
1,000 797.80* 0.80*
10,000 4,479.80 0.45
100,000 37,611.60 0.38
500,000 184,640.00 0.37
1,000,000 366,801.40 0.37
2,000,000 732,562.40 0.37
4,000,000 1,458,913.60 0.36
*Measurement is larger because it also measures some lightweight work before inserting the rows
Any hints, tips, or documentation is appreciated. Thank you!
Do not call GetField to retrieve key, return it directly from BuildRow (i.e. return KeyValuePair<string, IBinaryObject>
)
Parallelise the insertion (and BuildRow
calls):
Parallel.ForEach(rows, r =>
{
KeyValuePair<string, IBinaryObject> pair = BuildRow(r);
ds.AddData(pair);
});
Run more Ignite nodes on more machines
If rows come from external data source, you can make every Ignite node load only the related part. You can do that by executing the DataStreamer on each row via ICompute.Broadcast
and, while iterating over rows, check if the key belongs to current node:
IAffinity aff = m_ignite.GetAffinity(cacheName);
IClusterNode localNode = m_ignite.GetCluster().GetLocalNode();
Parallel.ForEach(rows, r =>
{
string key = GetKey(r);
if (aff.IsPrimary(localNode, key))
{
KeyValuePair<string, IBinaryObject> pair = BuildRow(r);
ds.AddData(pair);
}
});