Search code examples
hadoopmapreducebigdatahadoop-yarnhadoop2

Duration of yarn application log in hadoop


I am using the output of the yarn application command in hadoop to get to know about the details of the mapreduce job that were run by using the job name. My cluster is using HDP distribution. Does anyone know that till how long are the job status available? Does it keep track of the jobs for previous few days?


Solution

  • It depends on our cluster configuration. At production level setting, usually there is a history/archive server available to hold the logs for previous run. In a default yarn configuration, the log retention is set to 1 day, hence by default 1 day log is preserved.

    If history server is running, its default port is 19888. Check mapred-site.xml for below entry

    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>{job-history-hostname}:19888</value>
     </property>
    

    and yarn-site.xml

     <property>
        <name>yarn.log.server.url</name>
        <value>http://{job-history-hostname}:19888/jobhistory/logs</value>
      </property>