Search code examples
hadoopapache-pig

Merging two files using Pig


I have two files which I want to merge together. The format of these two files is as following - First File(f.txt)

Siler 1001
Gold  8009

Second File(s.txt)

Apple 100
Banana 200

I want the final merged file to look like -

Siler 1001
Gold  8009
Apple 100
Banana 200

I've been trying to use following code for doing this -

data1 = LOAD 'f.txt' AS name:chararray, num:int;
data2 = LOAD 's.txt' AS name:chararray, num:int;
data3 = UNION data1, data2;
data4 = GROUP data3 BY name;
data5 = FOREACH data4 GENERATE group, data3.num;
STORE data5 INTO 'final.txt';

But with code the output is coming like -

Silver {(1001)})
Gold {(8009)}
Apple {(100)}
Banana {(200)}

I want the output data to look like as I have mentioned above. Any suggestion how I can achieve that.


Solution

  • No need to group the data since the output required is a simple merge of two files with the same schema. A simple UNION is all that is needed. Unless you have duplicated items where num needs to be added in which case you would have to group and sum.

    data1 = LOAD 'f.txt' AS name:chararray, num:int;
    data2 = LOAD 's.txt' AS name:chararray, num:int;
    
    data3 = UNION data1, data2;
    
    STORE data3 INTO 'final.txt';