Search code examples
ignite

Cache partition not replicated


I have 2 nodes with the persistence enabled. I create a cache like so

  // all the queues across the frontier instances
    CacheConfiguration cacheCfg2 = new CacheConfiguration("queues");
    cacheCfg2.setBackups(backups);
    cacheCfg2.setCacheMode(CacheMode.PARTITIONED);
    globalQueueCache = ignite.getOrCreateCache(cacheCfg2);

where backups is a value > 1

When one of the nodes dies, I get

Exception in thread "Thread-2" javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=queues, part=2]
    at org.apache.ignite.internal.processors.cache.query.GridCacheQueryAdapter.executeScanQuery(GridCacheQueryAdapter.java:597)
    at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl$1.applyx(IgniteCacheProxyImpl.java:519)
    at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl$1.applyx(IgniteCacheProxyImpl.java:517)
    at org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)
    at org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:3482)
    at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:516)
    at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:843)
    at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:418)
    at crawlercommons.urlfrontier.service.ignite.IgniteService$QueueCheck.run(IgniteService.java:270)
Caused by: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=queues, part=2]
    ... 9 more

I expected the content to have been replicated onto the other node. Why isn't that the case?


Solution

  • Most likely there is a misconfiguration somewhere. Check the following:

    • you are not working with an existing cache (replace getOrCreateCache to createCache)
    • you are not having more server nodes than the backup factor is
    • inspect the logs for "Detected lost partitions" message and what happened prior