Search code examples
hadoopcpuworkload

Hadoop workload


I am currently using wordcount application in hadoop as a benchmark. I find that the cpu usage is fairly nearly constant around 80-90%. I would like to have a fluctuating cpu usage. Is there any hadoop application that can give me this capability? Thanks a lot.


Solution

  • I don't think there's a way to throttle or specify a range for hadoop to use. Hadoop will use the CPU available to it. When I'm running a lot of jobs, I'm constantly in the 90%+ range.

    One way you can control the CPU usage is to change the maximum number of mappers/reducers each tasktracker can run simultaneously. This is done through the mapred.tasktracker.{map|reduce}.tasks.maximum setting in $HADOOP_HOME/conf/core-site.xml.

    It will use less CPU on that tasktracker when the number of mapper/reducers is limited.

    Another way is to set the configuration value for mapred.tasktracker.{map|reduce}.tasks when setting up the job. This will force that job to use that many mappers/reducers. This number will be split across the available tasktrackers, so if you have 4 nodes and want each node to have 1 mapper you'd set mapred.tasktracker.map.tasks to 4. It's also possible that if a node can run 4 mappers, it will run all 4, I don't know exactly how hadoop will split out the tasks, but forcing a number, per job, is an option.

    I hope that helps get you to where you're going. I still don't quite understand what you are looking for. :)