Search code examples
javamultithreadingspringout-of-memoryrestart

Java Spring: ReRun died thread after OutOfMemoryError


I have developed an application with Spring. I've a bean to create a thread, but during the execution of this thread, at runtime, the JVM throws the OutOfMemoryError - Java Heap Space.
What I would ask is whether the following solution might be suitable to solve the problem: Once thrown the thread dies and frees the memory previously occupied by the thread, then I through another thread (which I call RestartThread), I realize that the thread is dead (without catching the error), then:
1) call Garbage Collector, that effectively frees the memory of the dead thread;
2) call back function run() of died Thread, that restart the previous instance (including private variables used by the died Thread, which remain in memory even after the generation of 'OutOfMemoryError') of the died Thread.

What do you think of this thing, it could create problems? Is a correct solution to restart previous istance of died Thread?

Thanks in advance,
--Alucard


Solution

  • Recovering from an OutOfMemoryError, especially in a multi-threaded environment, is very difficult, and often even impossible. You probably should be finding out why you run out of memory (like leaking references or your application simply needs more memory than you've given to it) and try to fix it, rather than trying to recover from it.

    Even if you could just let the thread throwing the error die and restart it, the restarted thread would probably just die again right at the beginning. In a more worse scenario, the root cause could be in some other part of the program. This would mean that other threads in your application would start throwing the same error, as they'd be trying to allocate new objects, resulting in the error cascading all through your application and finally the whole thing would come crashing down spectacularly.

    The memory isn't your only problem. Your application state can be pretty much anything (ie. inconsistent), if the thread terminated by an OOME was in the middle of processing and touched the state(s) of some shared object(s), which other threads also use. Also, if another thread was waiting for some monitor (mutex) the terminated thread was holding, or similar (wait/notify etc), the other thread could become deadlocked. In most cases writing the recovery logic and checking that the recovery was successful will be very difficult, as there are far too many variables and things to check before you can be sure the application has REALLY recovered.