Search code examples
apache-pig

Pig SUM Isn't Working


I'm running the following pig but I"m getting an ERROR 1066: Unable to open iterator for alias H.

A = LOAD 'hdfs:/home/ubuntu/pigtest/Master.csv' USING PigStorage(',');
B = FOREACH A GENERATE $0 AS id, $13 AS first, $14 AS last, $18 AS bats, $2 AS birthMonth,$7 AS deathYear;
C = FILTER B BY birthMonth==10 and deathYear==2011 and bats=='R';
D = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' USING PigStorage(',');
E = FOREACH D GENERATE $0 AS id, $7 AS hits;
F = JOIN E BY id, C BY id;
G = GROUP F BY E.id;
H = FOREACH G GENERATE $0, SUM($1.hits);
DUMP H;

When I describe G, I get:

G: {group: bytearray,F: {(E::id: bytearray,E::hits: int,C::id:bytearray,
    C::first: bytearray,C::last: bytearray,C::bats:bytearray,
    C::birthMonth: byetarray,C::deathYear: bytearray)}}

I've tried a ton of things inside of the SUM() function: F:hits, F.hits, F.E.hits, E.hits, E:hits but I don't know how I'm supposed to reference the tuple within the bag.

Thanks for ideas.


Solution

  • I suggest you try this (Haven't tried practicals) :

    A = LOAD 'hdfs:/home/ubuntu/pigtest/Master.csv' USING PigStorage(',');
    B = FOREACH A GENERATE $0 AS id, $13 AS first, $14 AS last, $18 AS bats, $2 AS birthMonth,$7 AS deathYear;
    C = FILTER B BY birthMonth==10 and deathYear==2011 and bats=='R';
    D = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' USING PigStorage(',');
    E = FOREACH D GENERATE $0 AS id, $7 AS hits;
    F = JOIN E BY id, C BY id; 
    ----- Try generating the columns you need and try DUMP to see if output 
    F1 = FOREACH F GENERATE E::id  as id, E::hits as hits;
    G = GROUP F1 BY id;
    H = FOREACH G GENERATE FLATTEN(group) as ID , SUM(F1.hits);
    DUMP H;
    

    Notice H = FOREACH G GENERATE FLATTEN(group) as ID , SUM(F1.hits); That's error in your code.