I used to launch my Hadoop job with the following
long start = new Date().getTime();
boolean status = job.waitForCompletion(true);
long end = new Date().getTime();
This way I could measure the time taken by the job once it ends directly in my code.
Now I have to use the JobControl in order to express dependencies between my jobs:
JobControl jobControl = new JobControl("MyJob");
jobControl.addJob(job1);
jobControl.addJob(job2);
job3.addDependingJob(job2);
jobControl.addJob(job3);
jobControl.run();
However once jobControl.run() has been executed, the code never goes further so I cannot include code to poll on the jobControl.getState() for the completion of the job.
How can I measure the time taken by a job using JobControl?
JobControl has no nice functionality to allow you to hook and get this information. You have some (potentially painful) options to try:
JobControl.run()
in a separate thread, and in your main thread, poll the JobControl.getXXXJobs()
methods to track when jobs change stateJobControl
and jobcontrol.Job
objects to track when a job changes state and add methods to query the start / end times