I have been using pig to filter a large file which contains data in tab separated form. The data inside that file is in the following form - fname lname age
Bill Gates 50
Warren Buffet 100
Elon Musk 80
Jack Dorsey 10
I want to filter this filter out where age > 50 and store the resulting data in (fname lname) form in a file using Pig.
Here is the code which I'm using -
data = LOAD 'persons.txt' AS (fname:chararray, lname:chararray, age:int);
data1 = FILTER data BY age > 50;
data2 = FOREACH data1 GENERATE (fname, lname);
STORE data2 INTO 'result.txt';
By using this code, I ma getting following output -
(Warren,Buffet)
(Elon,Musk)
This is not the output which I want instead I want to get following output -
(Warren Buffet)
(Elon Musk)
In order to get this kind of output I have tried using FOREACH data1 GENERATE (fname lname)
without a comma between fname and lname. But it shows error Synatx error, unexpected symbol at or near fname
.
Can anybody help me how can I get correct ouput?
Note -> I am running Pig on Hadoop Cluster not locally.
Use CONCAT
with a space in between fname and lname
data2 = FOREACH data1 GENERATE CONCAT(fname,' ',lname);