Search code examples
hadoophadoop-yarn

What does yarn.resourcemanager.am.max-attempts really mean?


I have a configuration entry in the yarn-site.xml as follows:

<property>
  <name>yarn.resourcemanager.am.max-attempts</name>
  <value>4</value>
</property>

I would ask what it really means, given the following two scenarios:

  1. Say, I have an ApplicationMaster and it has a bug. When I submit the application to YARN, will it try to start Application Master 5 times and then fail the application(I assume the bug causes the AM can't be started)

  2. Say, I have started a yarn application, and I kill the ApplicationMaster process manually, will the applicationMaster be automatically restarted? If so, I do kill the application - application restarted for another 4 times, will the AM not be restarted any more?


Solution

  • Let's say the AM is buggy and dies. Or it has a memory leak which causes it to exceed it's container size and gets killed. If it dies 4 times then the application's state is FAILED.

    So to answer your question, 1 is true assuming you mean 4 instead of 5 and 2 is true. If you want to understand this more, look at TestAMRestart.java.