Objective : To write unique key of the group as folder name and the bag content as records.
File : employee.txt
#JoiningDate Employee Id Employee Name
20140302 1 A
20140302 2 B
20140302 3 C
20140303 4 D
20140303 5 E
20140303 6 F
Pig script :
X = load 'employee.txt' using PigStorage('\t') as (joining_date:chararray, employee_id:long, employee_name:chararray);
Y = group X by joining_date;
Output of this would be (Y) :
(20140302, {(20140302,1,A), (20140302,2,B), (20140302,3,C)})
(20140303, {(20140303,4,D), (20140303,5,E), (20140303,6,F)})
Objective is to have tow folders in the output path :
1. outputfolder/20140302 : having three records
20140302,1,A
20140302,2,B
20140302,3,C
2. outputfolder/20140303 :
20140303,4,D
20140303,5,E
20140303,6,F
Tried
store Y into 'outputfolder' using org.apache.pig.piggybank.storage.MultiStorage('outputfolder', '0', 'none', ',');
Seeing result as below :
1. outputfolder/20140302/20140302-0
(20140302, {(20140302,1,A), (20140302,2,B), (20140302,3,C)})
2. outputfolder/20140303/20140303-0
(20140303, {(20140303,4,D), (20140303,5,E), (20140303,6,F)})
One option could be just flatten the values before store
command.
X = load 'employee.txt' using PigStorage('\t') as (joining_date:chararray, employee_id:long, employee_name:chararray);
Y = group X by joining_date;
Z = FOREACH Y GENERATE FLATTEN($1);
store Z into 'outputfolder' using org.apache.pig.piggybank.storage.MultiStorage('outputfolder', '0', 'none', ',');
Output will be stored in outputfolder/20140302
folder and file name start with something like this 20140302-0,000