Search code examples
apache-pigcross-join

Self cross-join in pig is disregarded


If one have data like those:

A = LOAD 'data' AS (a1:int,a2:int,a3:int);

DUMP A;
(1,2,3)
(4,2,1)

And then a cross-join is done on A, A:

B = CROSS A, A;

DUMP B;
(1,2,3)
(4,2,1)

Why is second A optimized out from the query?

info: pig version 0.11

== UPDATE ==

If I sort A like:

C = ORDER A BY a1;
D = CROSS A, C;

It will give a correct cross-join.


Solution

  • I think you have to load the data twice to achieve what you want.

    i.e.

    A1 = LOAD 'data' AS (a1:int,a2:int,a3:int);
    A2 = LOAD 'data' AS (a1:int,a2:int,a3:int);
    B = CROSS A1, A2;