Below is the data
col1,col2,col3,col4,col5
------------------------
10,20,30,40,dollar
20,30,40,50,dollar
20,30,10,50,dollar
61,62,63,64,dollar
61,62,63,64,pound
col1,col2,col3 will form the combination of unique keys. The use case is to filter the data based on col5. For the unique key combination we need to filter the record where col5 value is "dollar", only if the same combination has "pound" value.
The expected output is
col1,col2,col3,col4,col5
------------------------
10,20,30,40,dollar
20,30,40,50,dollar
20,30,10,50,dollar
61,62,63,64,pound
How to proceed further since there is no special operators in Pig like Hive.
A = load 'test1.csv' using PigStorage(',') as (col1:int,col2:int,col3:int,col4:int,col5:chararray);
B = FILTER A BY col5 == 'pound';
Get all the records with 'pound', then get all records with 'dollar' that does not match with the id combination with 'pound' in col5. Finally, marry them off ... UNION.
B = FILTER A BY col5 == 'pound';
C = JOIN A BY (col1,col2,col3) LEFT OUTER,B BY (col1,col2,col3);
D = FILTER C BY (B::col1 is null);
E = FOREACH D GENERATE A::col1,A::col2,A::col3,A::col4,A::col5;
F = UNION B,E;
DUMP F;
Output