Search code examples
ubuntumemoryvirtual-machineswapkill-process

Are processes killed automatically if memory and swap run out?


I have an Ubuntu 16.04 virtual machine for highly computation expensive jobs that run in parallel on the machines 32 cores (1 per core, assigned by GNU parallel). After runtimes of hours to days, I noticed that some cores were freed up and the corresponding processes are not running anymore. Also memory (~100GB) and swap (~1GB) are pretty much completely full according to htop. However, one process alone usually needs multiple GB.

What happened? Are the processes that are not actively running swapped out and to be continued later, once there is more memory available? Or were they just killed because also swap is full?

I'd rather manually stop a process and retrieve the intermediate result than having processes killed and losing any result after days of computation. I can't increase memory while the jobs are running, but I just came across swapspace. Does it make sense to install it while the processes are still running; in the hopes of automatically increasing swap space and preventing processes from being killed?


Solution

  • By default, Linux will provide more memory to processes than what is actually available in the system (memory overcommitment). Many memory allocations, such as stacks or malloc heap arenas, are never fully used, so this allows the system to do more work without encountering memory allocation failures.

    However, if processes write to all the memory that was allocated to them, then the kernel can not make good on this promise. There is no way to return an error to the processes because it's just a write instruction, so the kernel has to kill the process instead. There are some heuristics to pick a process that is not critical to the system and yet frees lots of RAM, but that does not always yield good results.

    In your case, you probably should configure your system with vm.overcommit_memory=2. This will disable memory overcommitment, and the kernel will only hand out memory which is actually backed by something physical (either RAM or swap).