Search code examples
apache-pig

Apache Pig only load first nested tuple


I use the exact sample from official document:

I have data.txt:

(3,8,9) (mary,19)
(1,4,7) (john,18)
(2,5,8) (joe,18)

I run:

A = LOAD 'data.txt' AS (F:tuple(f1:int,f2:int,f3:int),T:tuple(t1:chararray,t2:int));
dump A

I always got:

((3,8,9),)
((1,4,7),)
((2,5,8),)

The second nested tuple never got loaded. I tried in both versions of 0.16.0 and 0.17.0.


Solution

  • The problem should be with the data file you created. There should be tab in between both tuples as separator in the data file while creating it. If there was a space then we need to change the load query accordingly.

    a)With tab(\t) as delimiter or separator.

    grunt> A = LOAD '/home/ec2-user/data' AS (F:tuple(f1:int,f2:int,f3:int),T:tuple(t1:chararray,t2:int));
    grunt> DESCRIBE A;
    A: {F: (f1: int,f2: int,f3: int),T: (t1: chararray,t2: int)}
    grunt> dump A;
    ((3,8,9),(mary,19))
    ((1,4,7),(john,18))
    ((2,5,8),(joe,18))
    

    b)With single space( ) as delimiter or seperator.

    grunt> A = LOAD '/home/ec2-user/data' AS (F:tuple(f1:int,f2:int,f3:int),T:tuple(t1:chararray,t2:int));
    grunt> DESCRIBE A;
    A: {F: (f1: int,f2: int,f3: int),T: (t1: chararray,t2: int)}
    grunt> dump A;
    ((3,8,9),)
    ((1,4,7),)
    ((2,5,8),)
    

    #Use PigStorage(' ') in case if you still want to use space as delimiter for file.

    grunt> A = LOAD '/home/ec2-user/data' USING PigStorage(' ') AS (F:tuple(f1:int,f2:int,f3:int),T:tuple(t1:chararray,t2:int));
    grunt> DESCRIBE A;
    A: {F: (f1: int,f2: int,f3: int),T: (t1: chararray,t2: int)}
    grunt> dump A;
    ((3,8,9),(mary,19))
    ((1,4,7),(john,18))
    ((2,5,8),(joe,18))