I have a problem. I don't understand how can I generate unique "cross" for the input. Here is my input:
A, B, C
I would like to get:
A,B
A,C
B,C
What UDF (data-fu, piggybank) can I use to solve this problem?
If your input is like
A
B
C
and your want to output:
A,B
A,C
B,C
You can use cross
join to get the results. For example:
input1 = load 'your_path' as (key: chararray);
input2 = load 'your_path' as (key: chararray);
cross_results = cross input1, input2;
final_results = filter cross_results by input1::key < input2::key;
If "A,B,C" are only a bag in one record, you can use flatten
. For example,
-- Assume your input x is something like {A, B, C} in one row
y = foreach x generate flatten($0) as f1, flatten($0) as f2;
final_results = filter y by f1 < f2;
As your description is not very exhaustive, I can only provide the above solution. You may need to adapt it.