Search code examples
ambarihdp

ambari cluster + poor connection between ambari-agent to ambari server


we have ambari cluster with 872 data-nodes machines , when ambari version is 2.6.x

we have for now some network problem ,

after long investigation we found that , ambari agent that runs on some machine not communicate well with the ambari server

therefore we get some strange behaviors as 5 dead data-nodes from ambari dashboard , while for sure datanodes machine are healthy

is it possible to give more tolerated value in ambari agent configuration so the ack between ambari agent to ambari server will be after more little time in order to ignore the network problems ?

something like timeout or time connection between the ambari agent to ambari server


Solution

  • First of all, you need to get the root cause of the issue why Data Node is showing as Dead.

    1. Ambari agent runs on every node. It is responsible for sending metrics and heartbeat to the Ambari server which then publishes to your Ambari web UI.
    2. The name node waits for 10 minutes till it declares the data node as dead and copies the blocks to other data nodes.
    3. If it's showing that data node is dead then please check the Ambari agent status in the specific node by running-service ambari-agent status. Parallelly you can check the ambari-agent.log in the worker node to check why Ambari agent stopped working.