Search code examples
apache-pig

Filter after a foreach statement in pig


So i have table with one column, U and has column name u_id.

filter_out = filter A BY s_id == (FOREACH u GENERATE u_id);

I am basically trying to filter A table by match of every row in u table. So essentially if s_id from A(table1) matches u_id from 2nd tables, filter it out

I keep getting mismatched input 'u' expecting LEFT_PAREN error

-------------2nd approach----------------

And also have tried converting u to a tuple

totuple = FOREACH u GENERATE TOTUPLE (u_id);

filter_out = filter A BY s_id in (totuple);

and error A column needs to be projected from a relation for it to be used as a scalar


Solution

  • Instead, JOIN the two tables.Doing so will only match the records from table A with the records from table U.Finally generate the columns needed.

    B = JOIN A BY s_id,U BY u_id;
    C = FOREACH B GENERATE B.$0; -- Select the needed columns from the joined relation.
    DUMP C;