Search code examples
javahadoopmapreducebigdataoozie

Java MapReduce Counters - Oozie


Java applications are executed in the Hadoop cluster as map-reduce job with a single Mapper task. If a java mapreduce job(not hive or any other job just a direct mapreduce job) is a part of oozie we get a single mapper launcher and actual mapreduce job runs independently. So is there a way to link the launcher and the actual mapreduce job run? like get the jobid of the actual action running with launcher jobid? any command to know?


Solution

  • We can get the launcher id for any child id from the logs link that can be obtained from

    http://<rm httpaddress:port>/ws/v1/history/mapreduce/jobs/<jobid>/jobattempts
    

    There we get an xml which contains the logs link. If we parse through the syslog in that link we have a string like

    Service: job_
    

    Use this regular expression and find out the launcher id. If there is a launcher then we can get it from here.(Even for java actions in oozie workflow) The actual line will be something like this

    INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: mapreduce.job, Service: <jobid>
    

    The jobid after the Service: is launcher job id