Search code examples
hadoopapache-pig

Outputting a tuple with space between two values in pig


I have been using pig to filter a large file which contains data in tab separated form. The data inside that file is in the following form - fname lname age

Bill Gates 50
Warren Buffet 100
Elon Musk 80
Jack Dorsey 10

I want to filter this filter out where age > 50 and store the resulting data in (fname lname) form in a file using Pig.

Here is the code which I'm using -

data = LOAD 'persons.txt' AS (fname:chararray, lname:chararray, age:int);
data1 = FILTER data BY age > 50;
data2 = FOREACH data1 GENERATE (fname, lname);
STORE data2 INTO 'result.txt';

By using this code, I ma getting following output -

(Warren,Buffet)
(Elon,Musk)

This is not the output which I want instead I want to get following output -

(Warren Buffet)
(Elon Musk)

In order to get this kind of output I have tried using FOREACH data1 GENERATE (fname lname) without a comma between fname and lname. But it shows error Synatx error, unexpected symbol at or near fname.

Can anybody help me how can I get correct ouput?

Note -> I am running Pig on Hadoop Cluster not locally.


Solution

  • Use CONCAT with a space in between fname and lname

    data2 = FOREACH data1 GENERATE CONCAT(fname,' ',lname);