Will really appreciate if you can explain what i am doing wrong.
Script works for most part but when i use group by and dump results, i get error saying other_vertex_failure;
sample data
1,Apple,5.5
2,Orange,2.5
2,Orange,4.5
3,Kiwi,1.5
3,Kiwi,3.5
4,Banana,4.0
4,Banana,6.0
A = LOAD '/user/pig/apple.csv' USING PigStorage(','); **--this works**
B = FOREACH A GENERATE $0 as ids:int, $1 as fruit:chararray,
$2 as quan:int; **--this works**
C = GROUP B BY ids; **--this works gives no error**
but when i do dump C; it throws error.
is that using names with positional parameters is bad idea in pig?
You can assign alias
to your fields at LOAD
itself.
Since you haven't done it, the fields default to type bytearray
. And when it attempts to cast bytearray
to int
, it throws ClassCastException
.
A = LOAD '/user/pig/apple.csv' USING PigStorage(',') as (ids:int, fruit:chararray, quan:float);
C = GROUP A BY ids;
dump C;
(1,{(1,Apple,5.5)})
(2,{(2,Orange,4.5),(2,Orange,2.5)})
(3,{(3,Kiwi,3.5),(3,Kiwi,1.5)})
(4,{(4,Banana,6.0),(4,Banana,4.0)})