Search code examples
javac#performanceignite

Apache Ignite: How can I improve insertion performance?


What additional things can I do beyond using the IDataStreamer and IBinaryObject to decrease insertion time into Apache Ignite.NET? It is possible to get a significant performance increase or is this as good as it will get?

I'm using:

  • .NET
  • 41 Query Fields: 1 string field and 40 float fields per row
  • IBinaryObject / WithKeepBinary
  • IDataStreamer
  • Default JVM settings
  • Partitioned Cache
  • No Persistance

I used this example as a starting point: https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/examples/Apache.Ignite.Examples/Datagrid/DataStreamerExample.cs

Here's my usage of the IDataStreamer:

using (var ds = m_ignite.GetDataStreamer<string, IBinaryObject>(CacheName)) {
    foreach (var binaryRow in rows.Select(r => BuildRow(r))) {
        var key = binaryRow.GetField<string>(PrimaryKeyName);
        ds.AddData(key, binaryRow);
    }
}

Performance results: (5 nodes all with the same specifications)

BenchmarkDotNet=v0.10.8, OS=Windows 8.1 (6.3.9600)
Processor=Intel Xeon CPU E5-2698 v4 2.20GHz Intel Xeon CPU E5-2698 v4 2.20GHz, ProcessorCount=4
Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET
  [Host]     : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.2053.0
  Job-UZDKMF : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.2053.0

RunStrategy=Monitoring  TargetCount=1

NumRows      Mean (ms)      Per Row (ms/row) 
10           359.50*        35.95* 
100          465.50*        4.66* 
1,000        797.80*        0.80* 
10,000       4,479.80       0.45 
100,000      37,611.60      0.38 
500,000      184,640.00     0.37 
1,000,000    366,801.40     0.37 
2,000,000    732,562.40     0.37 
4,000,000    1,458,913.60   0.36

*Measurement is larger because it also measures some lightweight work before inserting the rows

Any hints, tips, or documentation is appreciated. Thank you!


Solution

    1. Do not call GetField to retrieve key, return it directly from BuildRow (i.e. return KeyValuePair<string, IBinaryObject>)

    2. Parallelise the insertion (and BuildRow calls):

      Parallel.ForEach(rows, r => 
      {
          KeyValuePair<string, IBinaryObject> pair = BuildRow(r);
          ds.AddData(pair);
      });
      
    3. Run more Ignite nodes on more machines

    4. If rows come from external data source, you can make every Ignite node load only the related part. You can do that by executing the DataStreamer on each row via ICompute.Broadcast and, while iterating over rows, check if the key belongs to current node:

      IAffinity aff = m_ignite.GetAffinity(cacheName);
      IClusterNode localNode = m_ignite.GetCluster().GetLocalNode();
      Parallel.ForEach(rows, r => 
      {
          string key = GetKey(r);
          if (aff.IsPrimary(localNode, key))
          {
              KeyValuePair<string, IBinaryObject> pair = BuildRow(r);
              ds.AddData(pair);
          }
      });