Search code examples
hadoopapache-pig

Generate unique cross in Pig


I have a problem. I don't understand how can I generate unique "cross" for the input. Here is my input:

A, B, C

I would like to get:

A,B
A,C
B,C

What UDF (data-fu, piggybank) can I use to solve this problem?


Solution

  • If your input is like

    A
    B
    C
    

    and your want to output:

    A,B
    A,C
    B,C
    

    You can use cross join to get the results. For example:

    input1 = load 'your_path' as (key: chararray);
    input2 = load 'your_path' as (key: chararray);
    cross_results = cross input1, input2;
    final_results = filter cross_results by input1::key < input2::key;
    

    If "A,B,C" are only a bag in one record, you can use flatten. For example,

    -- Assume your input x is something like {A, B, C} in one row
    y = foreach x generate flatten($0) as f1, flatten($0) as f2;
    final_results = filter y by f1 < f2;
    

    As your description is not very exhaustive, I can only provide the above solution. You may need to adapt it.