Search code examples
apache-pig

Multiple records to a single record in PIG


I've input file as below.

1,Cust_name1,addr_type,Addr1
1,Cust_name1,addr_type,Addr2
2,Cust_name3,addr_type,Addr1
2,Cust_name3,addr_type,Addr3

Want to convert this to Avro format.

output should be like

1,Cust_name1,{(addr_type,Addr1),(addr_type,Addr2)
1,Cust_name3,{(addr_type,Addr1),(addr_type,Addr3)

For each customer I want generate a single message in avro and repeated elements in array.


Solution

  • GROUP by Id and Customer Name.In order to store in Avro format use AvroStorage available in piggybank.jar and register it in your script.It can downloaded from here

    REGISTER /path/piggybank.jar;
    A = LOAD 'data.txt' USING PigStorage(',') AS (int:id;name:chararray;addrtype:chararray;addr:chararray);
    B = GROUP A BY (id,name);
    STORE B INTO '/path/' USING org.apache.pig.piggybank.storage.avro.AvroStorage();;