Search code examples
hadoopapache-pig

Convert elements of pig tuple to rows


I am trying to convert my input data which looks like :

Id,Name,Types,Code
1, A, a1;a2;a3, 101
2, B, b1;b2, 202
...

into a flatten structure where the types are separated into individual rows like :

1, A, a1, 101
1, A, a2, 101
1, A, a3, 101
2, B, b1, 202
2, B, b2, 202
... 

What I have tried here is after the StrSplit I get a tuple that I try to convert to a BAG which I can then Flatten into individual rows.

input_data = LOAD '/user/gjhawar/latestSkillMappedEn.csv' USING PigStorage('|') AS
(
id : chararray,
name : chararray,
type: chararray,
code : chararray);

a = LIMIT input_data 10;

b = FOREACH a GENERATE (id, name, code), BagToString(TOBAG(STRSPLIT (type,'\\u003B',100)), ' ') as newCategoryName:chararray;

Solution

  • The semicolon will have a problem as a delimiter. Replace it with something else, tokenize and flatten.

    http://www.hadooplessons.info/2015/01/word-count-in-pig-latin.html

    flattened_input_data = FOREACH a GENERATE skillId, skillName, matchType, culture, FLATTEN(TOKENIZE(REPLACE(categoryName,'\\u003B', '|'), '|')) as newCategoryName;