Search code examples
hadoophiveapache-pig

PIG: Filter hive table by previous table result


I need to query one HIVE table and filter the other table with one column of the previous one.

Example:

A = LOAD 'db.table1' USING org.apache.hive.hcatalog.pig.HCatLoader();

filterA = filter A by (id=='123');

B = LOAD 'db.table2' USING org.apache.hive.hcatalog.pig.HCatLoader();

//the problem is here. filterA has many rows. I need to apply filter for each of the row.

filterB = filter B by (id==filterA.id);

Data in A:

tabid id dept location

1 1 IS SJ

2 4 CS SF

3 5 EC MD

Data in B:

tabid id name address

1 4 john 123 S AVE

2 5 jane 456 N BLVD

3 9 nick 789 GREAT LAKE DR

Expected Result:

tabid id name address

1 4 john 123 S AVE

2 5 jane 456 N BLVD


Solution

  • As asked in the comment, it sounds like what you're looking for is a join. Sorry if I misunderstood your question.

    A = LOAD 'db.table1' USING org.apache.hive.hcatalog.pig.HCatLoader();
    B = LOAD 'db.table2' USING org.apache.hive.hcatalog.pig.HCatLoader();
    C = JOIN A by id, B by id;