Search code examples
javahadoopbenchmarkinglow-level

Benchmarking Hadoop jobs at low-level


I have to record a couple of benchmark variables. Unfortunately some of the variables require me to perform measurement within the hadoop code (map(), reduce(), InputFormat etc.). I was wondering what would be the "right" way to do it. I can use the global variables to store my benchmark variables and dump them just before the Tool.run() finishes, but I think there is a better way of doing this. Does anybody know how to do it, or have any idea?

Update

Benchmark code has to be embedded within hadoop, due to some constraints. I have a "tester" application which runs many hadoop jobs and collects the benchmark results. The idea is to run jobs and collect benchmark data from jobs execution, in a single "tester" run.


Solution

  • Nothing is stopping you from benchmarking those methods independently of MapReduce. M/R isn't magic- just a JVM running some code on the server for you.

    We run JUnit tests against individual Map and Reduce functions all the time. Nothing substantially different about profiling them.