Search code examples
apache-pig

PIG Group into bag by distinct value


recipe,ingredient,inventor
Tacos,Beef,Alex
Tacos,Lettuce,Alex
Tacos,Cheese,Alex
TomatoSoup,Tomatoes,Steve
TomatoSoup,Milk,Steve

I want to group the record by recipe and bag the ingredient and inventor like

(Tacos,{Beef,Lettuce,Cheese},Alex)
(TomatoSoup,{Tomatoes,Milk},Steve)

Solution

  • Group by recipe and inventor and then order the columns as per your requirement.

    A = LOAD 'data.txt' USING PigStorage(',') AS (recipe:chararray,ingredient:chararray,inventor:chararray);
    B = GROUP A BY (recipe,inventory);
    C = FOREACH B GENERATE FLATTEN(group) as (recipe,inventor),A.ingredient;
    D = FOREACH C GENERATE recipe,ingredient,inventory;
    DUMP D;