Search code examples
javapythonproducer-consumer

Correct place to catch out of memory error


I'm experiencing a problem with a producer-consumer setup for a local bot competition (think Scalatron, but with more languages allowed, and using pipes to connect with stdin and stdout). The items are produced fine, and handled correctly by the consumer, however, the consumer's task in this setting is to call other pieces of software that might take up too much memory, hence the out of memory error.

I've got a Python script (i.e. the consumer) continuously calling other pieces of code using subprocess.call. These are all submitted by other people for evaluation, however, sometimes one of these submitted pieces use so much memory, the engine produces an OutOfMemoryError, which causes the entire script to halt.

There are three layers in the used setup:

  • Consumer (Python)
  • Game engine (Java)
  • Players' bots (languages differ)

The consumer calls the game engine using two bots as arguments:
subprocess.call(['setsid', 'sudo', '-nu', 'botrunner', '/opt/bots/sh/run_bots.sh', bot1, bot2]).

Inside the game engine a loop runs pitting the bots against each other, and afterwards all data is saved in a database so players can review their bots. The idea is, should a bot cause an error, to log the error and hand victory to the opponent.

What is the correct place to catch this, though? Should this be done on the "highest" (i.e. consumer) level, or in the game engine itself?


Solution

  • The correct place to catch any Exception or Error in Java is the place where you have a mechanism to handle them and perform some recovery steps. In the case of OutOfMemoryError, you should catch the error ONLY when you are able to to close it down gracefully, cleanly releasing resources and logging the reason for the failure, if possible.

    OutOfMemoryError occurs due to a block memory allocation that cannot be satisfied with the remaining resources of the heap. Whenever OutOfMemoryError is thrown, the heap contains the exact same number of allocated objects before the unsuccessful attempt of allocation. This should be the actual time when you should catch the OutOfMemoryError and attempt to drop references to run-time objects to free even more memory that may be required for cleanup.

    If the JVM is in reparable state, which you can never determine it through the program, it is even possible to recover & continue from the error. But this is generally considered as a not good design as I said you can never determine it through the program.

    If you see the documentation of java.lang.Error, it says

    An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch.

    If you are catching any error on purpose, please remember NOT to blanket catch(Throwable t) {...} everywhere in your code.

    More details here.