I've input file as below.
1,Cust_name1,addr_type,Addr1
1,Cust_name1,addr_type,Addr2
2,Cust_name3,addr_type,Addr1
2,Cust_name3,addr_type,Addr3
Want to convert this to Avro
format.
output should be like
1,Cust_name1,{(addr_type,Addr1),(addr_type,Addr2)
1,Cust_name3,{(addr_type,Addr1),(addr_type,Addr3)
For each customer I want generate a single message in avro and repeated elements in array.
GROUP by Id and Customer Name.In order to store in Avro format use AvroStorage available in piggybank.jar and register it in your script.It can downloaded from here
REGISTER /path/piggybank.jar;
A = LOAD 'data.txt' USING PigStorage(',') AS (int:id;name:chararray;addrtype:chararray;addr:chararray);
B = GROUP A BY (id,name);
STORE B INTO '/path/' USING org.apache.pig.piggybank.storage.avro.AvroStorage();;