Search code examples
hadoop-yarnhadoop2

YARN Architecture of Hadoop 2.0


From below link of Apache Hadoop site, I learn that

ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler (ResourceManager)

and also learn that

ApplicationsManager negotiating the first container for executing the ApplicationMaster

Link : http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

So here is my confusion.

  1. If ApplicationMaster has the responsilibility to request ResourceManager for Container, then Who is creating the first container and what is the process to create the first container for executing the ApplicationMaster?
  2. Is there anyone giving and request to create the first container?
  3. What are the resonsibilities of the first Container? First Container only executes the ApplicationMaster or it is also behaving like other Resource Container?

Please let me know if anyone has the idea regarding this.

enter image description here


Solution

  • First of all, you are confusing the terms ApplicationManager and ApplicationMaster. They are not the same, have a look at my answer to understand difference between Application Manager and Application Master in YARN.

    Answers to your questions are given below:

    1. YarnClient has the responsibility to submit the application to ResourceManager, it sends an ApplicationSubmissionContext object to ResourceManager, which represents all of the information needed by the ResourceManager to launch the ApplicationMaster for an application.

    2. Yes, YarnClient does that!

    3. First Container is the Application Master, its job is to request the resources(containers) from ResourceManager and make application level decisions. If a sufficient number of containers (defined by the logic in your ApplicationMaster) are provided by the ResourceManager, then ApplicationMaster can go ahead and launch the application code on containers. FurtherMore, ApplicationMaster keeps track of failed containers and relauch them or terminates the application(kills all other containers), again based on the logic of your ApplicationMaster.

    To understand the internals of Hadoop YARN, i would suggest you to read YARN paper or if you have more time you can read a book on Hadoop YARN.