Search code examples
scalahadoopscalding

How to measure the running time of a scala scalding program?


I have a simple scalding program to transform some data which I execute using com.twitter.scalding.Tool in local mode.

val start = System.nanoTime    

val inputPaths = args("input").split(",").toList
val pipe = Tsv(inputPaths(0))
// standard pipe operations on my data like .filter( 'myField ), etc.
.write(Tsv(args("output")))

println("running time: " + (System.nanoTime - start) / 1e6 + "ms")

I would like to measure the running time of the program. I write the standard trick of measuring time at the beginning and end of the code, however, the result is ~100 ms, while the actual time is closer to 60 s. What is the best way to do this? Thanks!


Solution

  • I found a simple answer. Add time keyword before the hadoop command when running a job.

    time hadoop jar myjob.jar ...