I am hoping someone can help me create a java UDF that will take this input spread across three text files:
Montreal, 5 3 10 9 8
Toronto, 7 2 2 3 4 4
Edmonton, 3 3 1 1 7
Montreal, 2 2 9
and return the following output bags:
{(Montreal,5),(Montreal,3),(Montreal,10),(Montreal,9),(Montreal,8),(Montreal,2),(Montreal,2),(Montreal,9)}
{(Toronto,7),(Toronto,2),(Toronto,2),(Toronto,3),(Toronto,4),(Toronto,4)}
I am fairly new to java and any help you can provide is greatly appreciated. Thank you.
If you're using pig 0.14 or after that supports STRSPLITTOBAG, then
A = load 'test.input' using PigStorage(',') as (place:chararray, numbers:chararray);
B = FOREACH A GENERATE place, FLATTEN(STRSPLITTOBAG(numbers)) as number;
C = FOREACH B GENERATE place, (chararray) number;
D = GROUP C by place;
E = FOREACH D generate C; -- dropping group field
dump E;
Output
({(Toronto,2),(Toronto,2),(Toronto,7),(Toronto,4),(Toronto,4),(Toronto,3)})
({(Edmonton,7),(Edmonton,1),(Edmonton,1),(Edmonton,3),(Edmonton,3)})
({(Montreal,9),(Montreal,2),(Montreal,2),(Montreal,8),(Montreal,9),(Montreal,10),(Montreal,3),(Montreal,5)})