Search code examples
javaignitegridgain

Ignite gridgain generated project openfile limit issue


I m trying to cache a large dataset of some tables, My server is centos based with 8Go ram and 500Go disk space

I configured my local storage policy to persist and after getting a file open limit issue I tried to make to to 2 000 000 following theses steps

 vi /etc/sysctl.conf
 fs.file-max = 2000000     (2 million)
 :wq
 sysctl -p

but even using this fix

and setting the work directory on chmod -x I m still having the following error prompt

SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to initialize partition file: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin]]
class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to initialize partition file: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin
    at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:448)
    at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:337)
    at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:478)
    at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:462)
    at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:853)
    at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:694)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.getOrAllocatePartitionMetas(GridCacheOffheapManager.java:1679)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init0(GridCacheOffheapManager.java:1507)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2137)
    at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:429)
    at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4261)
    at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3407)
    at org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:771)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.loadEntry(GridDhtCacheAdapter.java:683)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.access$600(GridDhtCacheAdapter.java:103)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$5.apply(GridDhtCacheAdapter.java:633)
    at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$5.apply(GridDhtCacheAdapter.java:629)
    at org.apache.ignite.internal.processors.cache.store.GridCacheStoreManagerAdapter$3.apply(GridCacheStoreManagerAdapter.java:535)
    at org.apache.ignite.cache.store.jdbc.CacheAbstractJdbcStore$1.call(CacheAbstractJdbcStore.java:469)
    at org.apache.ignite.cache.store.jdbc.CacheAbstractJdbcStore$1.call(CacheAbstractJdbcStore.java:433)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.FileSystemException: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin: Too many open files
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newAsynchronousFileChannel(UnixFileSystemProvider.java:196)
    at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:248)
    at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:301)
    at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.<init>(AsyncFileIO.java:56)
    at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory.create(AsyncFileIOFactory.java:43)
    at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:420)
    ... 23 more

Nov 24, 2019 4:54:51 PM java.util.logging.LogManager$RootLogger log
SEVERE: JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to initialize partition file: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin]]

what could I do to fix IT


Solution

  • Adding the following configuration was enough for me to avoid this exception

    vi /etc/security/limits.conf
    root soft nofile 10240
    root hard nofile 20480
    

    Then in /etc/sysctl.conf I appended the max watcher config

    fs.inotify.max_user_watches=524288
    

    Knowing that root is my user account name

    The values are experimental I m not sure if this is safe but I hadn't any remarquable issue in my VM

    I didn't drop the previous configuration

    A reboot was needed

    Credit to @Stephen Darlington

    Just to explain what's going on here: fs.file-max sets an overall limit for the operating system. The stuff in limits.conf set limits for each user. The only other thing I would add is that if you're running Ignite as a user other than root (recommended) you'd change that users limits.