Search code examples
apache-pigbag

Pig can't load bag


I am trying to load a bag data type into a pig table and am coming up with null values instead.

Sample input:

A000,B000,C000,1.0,1-1-14,3-31-14,{(A101,1-Jan-2014,0.03,0.04)}
A001,B001,C001,10.0,1-1-14,3-31-14,{(A101,1-Jan-2014,0.03,0.045)}
A002,B002,C002,100.0,1-1-14,3-31-14,{(A101,1-Jan-2014,0.03,0.04)}

Pig Script:

raw = LOAD 'input/meh.log' USING PigStorage(',') AS (PID, FUNDID, GICID, balance, startDate, endDate, rates:bag{t:tuple(t1,t2,t3,t4)});
DUMP raw;

Output:

(A000,B000,C000,1.0,1-1-14,3-31-14,)
(A001,B001,C001,10.0,1-1-14,3-31-14,)
(A002,B002,C002,100.0,1-1-14,3-31-14,)
                                    ^Bag values should be here

What am I doing wrong? I've tried removing the bag/tuple declarations from the LOAD function, and still nothing. I used this same approach when working on the bag tutorial that came with Pig, and that seemed to work just fine.

UPDATE: If I set the bag input so that each tuple has one value, then this script works. I'm starting to think this may be an issue with my version of Pig (0.12.2). I had to build Pig using Ant so that it can run on Hadoop 2.3. Thoughts?


Solution

  • Reformatted the data

    A000    B000    C000    1   1-1-14  3-31-14 {(101,1-Jan-2014,0.03,0.04)}
    A001    B001    C001    10  1-1-14  3-31-14 {(101,1-Jan-2014,0.03,0.04)}
    A002    B002    C002    100 1-1-14  3-31-14 {(101,1-Jan-2014,0.03,0.04)}
    

    Have the values separated by the tabs. Oddly enough, it works. I had the delimiter set to ',' which may have confused pig when it tried to read the bag. I guess if you have bags with multivariate tuples, either set the delimiter to anything but ',' or just don't set it at all.