hadoop crash load hortonworks-data-platform

Hortonworks Data Platform: High load causes node restart

I have setup a Hadoop Cluster with Hortonworks Data Platform 2.5. I'm using 1 master and 5 slave (worker) nodes.

Every few days one (or more) of my worker nodes gets a high load and seem to restart the whole CentOS operating system automatically. After the restart the Hadoop components don't run anymore and have to be restarted manually via the Amabri management UI.

Here a screenshot of the "crashed" node (reboot after the high load value ~4 hours ago):

Here a screenshot of one of other "healthy" worker node (all other workers have similar values):

The node crashes alternate between the 5 worker nodes, the master node seems to run without problems.

What could cause this problem? Where are these high load values coming from?

Solution

This seems to be a Kernel problem, as the log file (e.g. /var/spool/abrt/vmcore-127.0.0.1-2017-06-26-12:27:34/backtrace) says something like

Version: 3.10.0-327.el7.x86_64
BUG: unable to handle kernel NULL pointer dereference at 00000000000001a0

After running a sudo yum update I had the kernel version

[root@myhost ~]# uname -r
3.10.0-514.26.2.el7.x86_64

Since the operating system updates the problem didn't occur anymore. I will observe the issue and give feedback if neccessary.