I have a simple scalding program to transform some data which I execute using com.twitter.scalding.Tool in local mode.
val start = System.nanoTime
val inputPaths = args("input").split(",").toList
val pipe = Tsv(inputPaths(0))
// standard pipe operations on my data like .filter( 'myField ), etc.
.write(Tsv(args("output")))
println("running time: " + (System.nanoTime - start) / 1e6 + "ms")
I would like to measure the running time of the program. I write the standard trick of measuring time at the beginning and end of the code, however, the result is ~100 ms, while the actual time is closer to 60 s. What is the best way to do this? Thanks!
I found a simple answer. Add time keyword before the hadoop command when running a job.
time hadoop jar myjob.jar ...