Search code examples
javaspringwebspherehazelcast

Spring MVC request/response thread throws a Exception with a caused by of another scheduler thread


We have a production web server built with IBM Websphere, Spring IoC, Spring MVC, and Hazelcast. We use Hazelcast as the Spring session implementation. A single schedule thread will do the health check every 60 seconds. During a large excel exporting job, many OutOfMemoryErrors has thrown. But after the job is completed with success, most requests to the server are failed with a stack trace like this.

[LargeThreadPool-thread-6481] ERROR xxxxFilter Exception captured in filter scope: org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java Heap space
  at org.springframework.web.DispatcherServlet.doDispatch(DispatcherServlet.java:982)
  at org.springframework.web.DispatcherServlet.doService(DispatcherServlet.java:901)
  xxx
  spring filter stack
  xxx
  IBM Websphere stack
Caused by: java.lang.OutOfMemroyError: Java heap space
  at java.lang.Class.getDeclaredFieldsImpl
  at java.lang.Class.getDeclaredFileds
  at java.io.ObjectStreamClass.getDefaultSerialFileds
  xxx
  at com.hazelcast.client.impl.proxy.ClientMapProxy.setAsync
  xxx
  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFuturedTask.run
  

Websphere dumped 4 phd and java core files during the job execution, no more dump after the job was completed.

I tried to debug the code of hazelcast, they did catch the OOME to do some client lifecycle management but did not rethrow it to somewhere another thread can access it.

My questions: The schedule thread should be dead after OOME happened. How come the thread execution can repeatably be the cause of another thread?


Solution

  • there are two possibilities,

    1. Repeat mechanism in code.
    2. Scheduler works multiple times and gets outofmemory sometimes and not every time, it depends to load of the application.

    Easiest solution would be increase the heap https://stackoverflow.com/a/69348545/175554 better solution would be change the scheduler code to handle data not getting outofmemory, paging and processing block by block.