Search code examples
apache-pig

Read tuple from file in pig latin


This is an example from https://pig.apache.org/docs/r0.17.0/basic.html

cat data;
(3,8,9) (4,5,6)
(1,4,7) (3,7,5)
(2,5,8) (9,5,8)

 A = LOAD 'data' AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));

 DUMP A;
 ((3,8,9),(4,5,6))
 ((1,4,7),(3,7,5))
 ((2,5,8),(9,5,8))

I have created a tp.txt in maria_dev which has the same date (i.e.

(3,8,9) (4,5,6)
(1,4,7) (3,7,5)
(2,5,8) (9,5,8)

) and read it by:

tp = LOAD 'tp.txt' as (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));           

but when I ran DUMP X in grunt, I get the following output:

((3,8,9),)
((1,4,7),)
((2,5,8),)

What am I doing wrong here?


Solution

  • The load statement default assumes that your fields are tab-separated. You seem to be using spaces in your text file. Without changing your file, you can do:

    tp = LOAD 'tp.txt' USING PigStorage(' ') AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));
    

    Or you can replace spaces in your text file with tabs and keep your load statement as-is.