I would like to subtract a group from an other on PIG. I would like to do exactly the same what "comm -23" command is doing on bash, but I can't find any documentation about that on the internet.
So for example: GROUP A is: 1 2 3 4 5 6
GROUP B is: 3 4 5 6 7
And the output, that i need is: GROUP A - GROUP B: 1 2
As WinnieNicklaus suggested, DataFu is a good resource. I wrote the SetDifference UDF for exactly this use case. Assuming you are working with bags, this will work for your use case.
Example from the documentation:
define SetDifference datafu.pig.sets.SetDifference();
-- input:
-- ({(1),(2),(3),(4),(5),(6)},{(3),(4)})
input = LOAD 'input' AS (B1:bag{T:tuple(val:int)},B2:bag{T:tuple(val:int)});
input = FOREACH input {
B1 = ORDER B1 BY val ASC;
B2 = ORDER B2 BY val ASC;
-- output:
-- ({(1),(2),(5),(6)})
GENERATE SetDifference(B1,B2);
}