I have a configuration entry in the yarn-site.xml as follows:
<property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>4</value>
</property>
I would ask what it really means, given the following two scenarios:
Say, I have an ApplicationMaster and it has a bug. When I submit the application to YARN, will it try to start Application Master 5 times and then fail the application(I assume the bug causes the AM can't be started)
Say, I have started a yarn application, and I kill the ApplicationMaster process manually, will the applicationMaster be automatically restarted?
If so, I do kill the application - application restarted
for another 4 times, will the AM not be restarted any more?
Let's say the AM is buggy and dies. Or it has a memory leak which causes it to exceed it's container size and gets killed. If it dies 4 times then the application's state is FAILED
.
So to answer your question, 1 is true assuming you mean 4 instead of 5 and 2 is true. If you want to understand this more, look at TestAMRestart.java.