Search code examples
hadoopmapreducehadoop-partitioning

Get max salary employee name using hadoop map reduce


i am very new to M/R programs..i have a file in HDFS with data in this structure

EmpId,EmpName,Dept,Salary,

1231,userName1,Dept1,5000
1232,userName2,Dept2,6000
1233,userName3,Dept3,7000
.
.
.........................

now i want to find the name of the employee who earns highest salary

i have written a map reduce to find the highest salary.in my mapper class i have emitted the output like this

output.collect("max value",salary of the emplyee);

In the reducer i found out the max value of the key "max value".now i want to use this value in a mapper and find the names of the employee who earns maximum salary..how can i send the reducer output to a mapper as input?is this a good approach to accomplish my task?any other suggestions?


Solution

  • I would make the map emit the full tuple of the max salary. For that, create a class (for the value) that implements Writable interface (http://hadoop.apache.org/docs/r1.2.0/api/org/apache/hadoop/io/Writable.html). Maybe TupleWritable suits your needs (not much complex).

    Since you will have 1 value emited per map, network is not an issue and seems fine to receive all tuple data in the reducer. Your reducer will just have to filter the top from the "max" values.

    For more complex problems, you will have to think about chaining jobs (http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining)