Search code examples
hadoopapache-pig

Pig script to find the max, min,avg,sum of Salary in each department


I get stuck after grouping the data by department no.The steps followed by me

grunt> A = load '/home/cloudera/naveen1/hive_data/emp_data.txt' using PigStorage(',') as (eno:int,ename:chararray,job:chararray,sal:float,comm:float,dno:int); 
grunt> B = group A by don;
grunt> describe B;
B: {group: int,A: {(eno: int,ename: chararray,job: chararray,sal: float,comm: float,dno: int)}}

Please let me know the steps after this.I am bit confused about the Nested Foreach statement execution.

Data contains eno,ename,sal,job,commisson,deptno and i want extract the max sal in each dept and the employee getting the highest salary.

Similary for min sal.


Solution

  • Use the aggregate functions after grouping.

    C = FOREACH B GENERATE group,MAX(A.sal),MIN(A.sal),AVG(A.sal),SUM(A.sal);
    DUMP C;
    

    To get the name,eno and max sal in each dept,sort the records and get the top row

    C = FOREACH B {
            max_sal = ORDER A BY sal DESC;
            max_limit = LIMIT max_sal 1;
            GENERATE FLATTEN(max_limit);
    }
    DUMP C;