Search code examples
apache-pig

PIG Group by avoid Bag


This is a basic PIG question. I have my data something like this

10  | Dog
15 | Cow
20 | Dog
15 | Elephant
15 | Dog
25 | Elephant

I want to find the average weight of each animal and have the output as this :

Dog | 12.5
Elephant | 20
Cow | 15

I am able to use GROUP by and get the result, but the result is a bag, something like this :

 {(Dog), (Dog) } | 12.5
 {(Elephant), (Elephant)} | 20
 {(Cow)} | 15

How can I extract just the individual animal ?

I am using GROUP by like this.

--animal_weight is derived through other means
animal_by = GROUP animal_weight by (animal);
results = FOREACH animal_by GENERATE animal_weight.animal as animal_name, AVG(animal_weight.weight) as kg;
STORE results INTO '$output_4' USING PigStorage('|');

Solution

  • Use group instead of animal_weight.animal.Note that from your sample data,Dog should have an average of weight (10+20+15)/3 = 15 kg

    results = FOREACH animal_by GENERATE group as animal_name, AVG(animal_weight.weight) as kg;
    

    Output

    enter image description here