Search code examples
performanceignite

Enrich each existing value in a cache with the data from another cache in an Ignite cluster


What is the best way to update a field of each existing value in a Ignite cache with data from another cache in the same cluster in the most performant way (tens of millions of records about a kilobyte each)?

Pseudo code:

try (mappings = getCache("mappings")) {
    try (entities = getCache("entities")) {
        entities.foreach((key, entity) -> entity.setInternalId(mappings.getValue(entity.getExternalId());
    }
}

Solution

  • I would advise to use compute and send a closure to all the nodes in the cache topology. Then, on each node you would iterate through a local primary set and do the updates. Even with this approach you would still be better off batching up updates and issuing them with a putAll call (or maybe use IgniteDataStreamer).

    NOTE: for the example below, it is important that keys in "mappings" and "entities" caches are either identical or colocated. More information on collocation is here: https://apacheignite.readme.io/docs/affinity-collocation

    The pseudo code would look something like this:

    ClusterGroup cacheNodes = ignite.cluster().forCache("mappings");
    
    IgniteCompute compute = ignite.compute(cacheNodes.nodes());
    
    compute.broadcast(() -> {
        IgniteCache<> mappings = getCache("mappings");
        IgniteCache<> entities = getCache("entities");
    
        // Iterate over local primary entries.
        entities.localEntries(CachePeekMode.PRIMARY).forEach((entry) -> {
           V1 mappingVal = mappings.get(entry.getKey());
           V2 entityVal = entry.getValue();
    
           V2 newEntityVal = // do enrichment;
    
           // It would be better to create a batch, and then call putAll(...)
           // Using simple put call for simplicity.
           entities.put(entry.getKey(), newEntityVal);
        }
    });