I have data set below
key,value
---------
key1|10
key1|20
key1|30
key2|50
key2|70
I need to populate new column for the same key with max "value" column.
Output must be
key1|10|30
key1|20|30
key1|30|30
key2|50|70
key2|70|70
Below is the Pig script, but facing issues.
A = LOAD 'input.txt' using PigStorage('|');
B = foreach A generate $0,$1,min($1);
grunt> A = LOAD 'input.txt' using PigStorage('|');
grunt> B = foreach A generate $0,$1,max($1);
2017-05-26 06:48:02,347 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve max using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
The following code should do. Remember that you need to group
the relation first before you can use functions like MAX
, MIN
, AVG
.
A = load 'file' using PigStorage(',') as (id: chararray, val: int);
B = GROUP A by id;
C = FOREACH B GENERATE FLATTEN(group), MAX(A.val) as (maxval: int);
D = JOIN A by id, C BY group;
E = FOREACH D generate A::id, A::val, C::maxval;
DUMP E;
Run this and you should get:
(key1,30,30)
(key1,20,30)
(key1,10,30)
(key2,70,70)
(key2,50,70)