Search code examples
hbaseapache-piguser-defined-functions

concatenate multi records in pig


I want to concatenate all records using Pig. After load in the data with "pigStorage" and '-tagFile' label, my data looks like:

(filename, aaaaaaaaaaa)
(filename, bbbbbbbbbbbbbb)

And the result I prefer is:

(filename, aaaaaaaaaaabbbbbbbbbbbbbb)

Then I can store the data into HBase with filename as rowkey.

Any suggestion will be appreciated.


Solution

  • GROUP the data by the filename and then use BagToString to CONCAT all bags to a single string.

    B = GROUP A BY filename;
    C = FOREACH B GENERATE group,BagToString(A.$1,'');
    DUMP C;