Search code examples
javalinuxjakarta-eewildfly-8

Does wildfly kills another wildfly?


We experienced a (at least in our eyes) strange problem:

We have two Wildfly 8.1 installations on the same linux machine (CentOS 6.6) running the same applications in different versions and listining to different ports.

Now, we discovered that all of a sudden, when starting one of them, the other one got killed. We then discovered that the amount of free memory was low due to other leaking processes. When we killed those, the two wildlflys were running both correctly again.

Since I don't think that linux itself decided to kill another random process, I assume that JBoss has either some sort of mechanism to free memory by killing something which it assumes is not longer needed or that there are (maybe by wrong configuration) resources used by both of them leading to one of them getting killed when not being able to obtain it.

Did anyone experience something similar or know of a mechanism of that sort?


Solution

  • Most probably it was the linux OOM Killer. You can verify if one of the servers was killed by it by checking the logfiles:

    grep -i kill /var/log/messages*

    And if it was you shoud see something like:

    host kernel: Out of Memory: Killed process 2592

    The OOM killer uses the following algorithm when determining which process to kill:

    The function select_bad_process() is responsible for choosing a process to kill. It decides by stepping through each running task and calculating how suitable it is for killing with the function badness(). The badness is calculated as follows, note that the square roots are integer approximations calculated with int_sqrt();

    badness_for_task = total_vm_for_task / (sqrt(cpu_time_in_seconds) *
    sqrt(sqrt(cpu_time_in_minutes)))
    

    This has been chosen to select a process that is using a large amount of memory but is not that long lived. Processes which have been running a long time are unlikely to be the cause of memory shortage so this calculation is likely to select a process that uses a lot of memory but has not been running long.

    You can manually see the badness of each process by reading the oom_score file in the process directory in /proc

    cat /proc/10292/oom_score