Search code examples
hadoopmapreduceapache-pig

Pig filter not working


I have the following pig script,

meta_file = LOAD 'meta_file' USING PigStorage(',');

DUMP meta_file;

meta = FOREACH meta_file GENERATE (chararray)$0 AS is_vta:chararray, (chararray)$1 AS id:long;

DUMP meta;

new_d = FILTER meta BY (is_vta == 't');
DUMP new_d;

Contents of meta_file:

"t","7181397"
"t","6331589"
"f","7266217"
"t","6051440"
"t","6901437"
"t","6805292"
"f","7144764"
"t","6820265"
"f","7515321"
"t","4777938"

DUMP of meta_file is exactly fine and is same as the contents of file, so are the contents of meta, but new_d is empty. I can see that there are is_vta in meta with values t, but still new_d is empty. Why isn't meta getting filtered properly? What am I doing wrong here? I am new to Pig Latin and am not able to figure out what might be the problem here.

Thanks for all your help.


Solution

  • simple way:

    new_d = FILTER meta BY is_vta MATCHES '.*t.*';
    

    another solution:

    remquotes = FOREACH meta GENERATE REPLACE($0, '\\"', '') AS is_vta:chararray, id;
    
    new_d = FILTER remquotes BY is_vta == 't';