Search code examples
apache-pig

Filter out records that exist in another table


Table A has 100 records with Column1 in focus

Table B has 10 records with Column1 in focus

So I have to always filter out the records in Table A based on the column1 in Table b

Table_B = foreach B generate flatten(TOTUPLE(SEN_NBR));

result = FILTER TABLE_A BY SEN_NBR NOT IN (Table_B);

any help would be great!


Solution

  • Use a LEFT OUTER JOIN and filter out the nulls.That will give only the records from Table A which are not in Table B

    A = JOIN Table_A BY SEN_NBR LEFT OUTER,Table_B by SEN_NBR;
    B = FILTER A by Table_B.SEN_NBR is null;
    

    NOTE:I've answered a similar question with an example here