Search code examples
apache-pig

Using Aggregate functions in Pig


My input file is below

a1,1,on,400 

a1,2,off,100

a1,3,on,200

I need to add $3 only if $2 is equal to "on".I have written script as below, after that I don't know how to proceed. For adding $3 only I need to apply some filter. for adding $1 there is no filter at all

Can someone help me on finishing this.

myinput = LOAD 'file' USING PigStorage(',') AS(id:chararray,flag:chararray,amt:int)
grouped = GROUP myinput BY id

I need output as below

a1, 6,600


Solution

  • Here is a possible solution,

    You could do something like this (not tested) :

    myinput = LOAD 'file' USING PigStorage(',');
    A = FOREACH myinput GENERATE $0 as id, $1 as first_sum, (($2 == 'on') ? $3 : 0) as second_sum;
    grouped = GROUP A BY id;
    RESULT = FOREACH grouped GENERATE group as id, SUM($1.first_sum), SUM($1.second_sum);
    

    That should do the trick