I'm running the following pig but I"m getting an ERROR 1066: Unable to open iterator for alias H.
A = LOAD 'hdfs:/home/ubuntu/pigtest/Master.csv' USING PigStorage(',');
B = FOREACH A GENERATE $0 AS id, $13 AS first, $14 AS last, $18 AS bats, $2 AS birthMonth,$7 AS deathYear;
C = FILTER B BY birthMonth==10 and deathYear==2011 and bats=='R';
D = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' USING PigStorage(',');
E = FOREACH D GENERATE $0 AS id, $7 AS hits;
F = JOIN E BY id, C BY id;
G = GROUP F BY E.id;
H = FOREACH G GENERATE $0, SUM($1.hits);
DUMP H;
When I describe G, I get:
G: {group: bytearray,F: {(E::id: bytearray,E::hits: int,C::id:bytearray,
C::first: bytearray,C::last: bytearray,C::bats:bytearray,
C::birthMonth: byetarray,C::deathYear: bytearray)}}
I've tried a ton of things inside of the SUM() function: F:hits, F.hits, F.E.hits, E.hits, E:hits but I don't know how I'm supposed to reference the tuple within the bag.
Thanks for ideas.
I suggest you try this (Haven't tried practicals) :
A = LOAD 'hdfs:/home/ubuntu/pigtest/Master.csv' USING PigStorage(',');
B = FOREACH A GENERATE $0 AS id, $13 AS first, $14 AS last, $18 AS bats, $2 AS birthMonth,$7 AS deathYear;
C = FILTER B BY birthMonth==10 and deathYear==2011 and bats=='R';
D = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' USING PigStorage(',');
E = FOREACH D GENERATE $0 AS id, $7 AS hits;
F = JOIN E BY id, C BY id;
----- Try generating the columns you need and try DUMP to see if output
F1 = FOREACH F GENERATE E::id as id, E::hits as hits;
G = GROUP F1 BY id;
H = FOREACH G GENERATE FLATTEN(group) as ID , SUM(F1.hits);
DUMP H;
Notice H = FOREACH G GENERATE FLATTEN(group) as ID , SUM(F1.hits); That's error in your code.