We have a production web server built with IBM Websphere, Spring IoC, Spring MVC, and Hazelcast. We use Hazelcast as the Spring session implementation. A single schedule thread will do the health check every 60 seconds. During a large excel exporting job, many OutOfMemoryErrors has thrown. But after the job is completed with success, most requests to the server are failed with a stack trace like this.
[LargeThreadPool-thread-6481] ERROR xxxxFilter Exception captured in filter scope: org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java Heap space
at org.springframework.web.DispatcherServlet.doDispatch(DispatcherServlet.java:982)
at org.springframework.web.DispatcherServlet.doService(DispatcherServlet.java:901)
xxx
spring filter stack
xxx
IBM Websphere stack
Caused by: java.lang.OutOfMemroyError: Java heap space
at java.lang.Class.getDeclaredFieldsImpl
at java.lang.Class.getDeclaredFileds
at java.io.ObjectStreamClass.getDefaultSerialFileds
xxx
at com.hazelcast.client.impl.proxy.ClientMapProxy.setAsync
xxx
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFuturedTask.run
Websphere dumped 4 phd and java core files during the job execution, no more dump after the job was completed.
I tried to debug the code of hazelcast, they did catch the OOME to do some client lifecycle management but did not rethrow it to somewhere another thread can access it.
My questions: The schedule thread should be dead after OOME happened. How come the thread execution can repeatably be the cause of another thread?
there are two possibilities,
Easiest solution would be increase the heap https://stackoverflow.com/a/69348545/175554 better solution would be change the scheduler code to handle data not getting outofmemory, paging and processing block by block.