Search code examples
springhazelcastjcache

JCache Hazelcast embedded does not scale


Hello, Stackoverflow Community.

I have a Spring Boot application that uses Jcache with Hazelcast implementation as a cache Framework.

Each Hazelcast node has 5 caches with the size of 50000 elements each. There are 4 Hazelcast Instances that form a cluster.

The problem that I face is the following:

I have a very heavy call that reads data from all four caches. On the initial start, when all caches are yet empty, this call takes up to 600 seconds.

When there is one Hazelcast instance running and all 5 caches are filled with data, then this call happens relatively fast, it takes on average only 4 seconds.

When I start 2 Hazelcast instances and they form a cluster, then the response time gets worse, and the same call takes already 25 seconds on average.

And the more Hazelcast instances I add in a cluster, the longer the response time gets. Of course, I was expecting to see some worse delivery time when data is partitioned among Hazelcast nodes in a cluster. But I did not expect that just by adding one more hazelcast instance, the response time would get 6 - 7 times slower...

Please note, that for simplicity reasons and for testing purposes, I just start four Spring Boot Instances with each Hazelcast embedded node embedded in it on one machine. Therefore, such poor performance cannot be justified by network delays. I assume that this API call is so slow even with Hazelcast because much data needs to be serialized/deserialized when sent among Hazelcast cluster nodes. Please correct me if I am wrong.

The cache data is partitioned evenly among all nodes. I was thinking about adding near cache in order to reduce latency, however, according to the Hazelcast Documentation, the near cache is not available for Jcache Members. In my case, because of some project requirements, I am not able to switch to Jcache Clients to make use of Near Cache. Is there maybe some advice on how to reduce latency in such a scenario?

Thank you in advance.


DUMMY CODE SAMPLES TO DEMONSTRATE THE PROBLEM:

  1. Hazelcast Config: stays default, nothing is changed
  2. Caches:
private void createCaches() {

      CacheConfiguration<?, ?> cacheConfig = new CacheConfig<>()
              .setEvictionConfig(
                      new EvictionConfig()
                              .setEvictionPolicy(EvictionPolicy.LRU)
                              .setSize(150000)
                              .setMaxSizePolicy(MaxSizePolicy.ENTRY_COUNT)
              )
              .setBackupCount(5)
              .setInMemoryFormat(InMemoryFormat.OBJECT)
              .setManagementEnabled(true)
              .setStatisticsEnabled(true);
      cacheManager.createCache("books", cacheConfig);
      cacheManager.createCache("bottles", cacheConfig);
      cacheManager.createCache("chairs", cacheConfig);
      cacheManager.createCache("tables", cacheConfig);
      cacheManager.createCache("windows", cacheConfig);
  }

  1. Dummy Controller:
@GetMapping("/dummy_call")
    public String getExampleObjects() { // simulates a situatation where one call needs to fetch data from multiple cached sources.
        Instant start = Instant.now();
        int i = 0;
        while (i != 50000) {
            exampleService.getBook(i);
            exampleService.getBottle(i);
            exampleService.getChair(i);
            exampleService.getTable(i);
            exampleService.getWindow(i);
            i++;
        }
        Instant end = Instant.now();
        return String.format("The heavy call took: %o seconds", Duration.between(start, end).getSeconds());
    }

  1. Dummy service:
@Service
public class ExampleService {

    @CacheResult(cacheName = "books")
    public ExampleBooks getBook(int i) {
        try {
            Thread.sleep(1); // just to simulate slow service here!
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return new Book(Integer.toString(i), Integer.toString(i));
    }

    @CacheResult(cacheName = "bottles")
    public ExampleMooks getBottle(int i) {
        try {
            Thread.sleep(1);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return new Bottle(Integer.toString(i), Integer.toString(i));
    }

    @CacheResult(cacheName = "chairs")
    public ExamplePooks getChair(int i) {
        try {
            Thread.sleep(1);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return new Chair(Integer.toString(i), Integer.toString(i));
    }

    @CacheResult(cacheName = "tables")
    public ExampleRooks getTable(int i) {
        try {
            Thread.sleep(1);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return new Table(Integer.toString(i), Integer.toString(i));
    }

    @CacheResult(cacheName = "windows")
    public ExampleTooks getWindow(int i) {
        try {
            Thread.sleep(1);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return new Window(Integer.toString(i), Integer.toString(i));
    }
}

Solution

  • If you do the math:

    4s / 250 000 lookups is 0.016 ms per local lookup. This seems rather high, but let's take that.

    When you add a single node then the data gets partitioned and half of the requests will be served from the other node. If you add 2 more nodes (4 total) then 25 % of the requests will be served locally and 75 % will be served over network. This should explain why the response time grows when you add more nodes.

    Even simple ping on localhost takes twice or more time. On a real network the read latency we see in benchmarks is 0.3-0.4 ms per read call. This makes:

    0.25 * 250k *0.016 + 0.75 * 250k * 0.3 = ~57 s

    You simply won't be able to make so many calls serially over the network (even local one), you need to either

    • parallelize the calls - use javax.cache.Cache#getAll to reduce the number of calls
    • you can try enabling reading local backups via com.hazelcast.config.MapConfig#setReadBackupData so there is less requests over the network.

    The read backup data feature is only available for IMap, so you would need to use Spring caching with hazelcast-spring module and its com.hazelcast.spring.cache.HazelcastCacheManager:

        @Bean
        HazelcastCacheManager cacheManager(HazelcastInstance hazelcastInstance) {
            return new HazelcastCacheManager(hazelcastInstance);
        }
    

    See documentation for more details.