Search code examples
apache-pig

Take MIN EFF_DT and MAX_CANC_dt from data in PIG


Schema :

TYP|ID|RECORD|SEX|EFF_DT|CANC_DT

DMF|1234567|98765432|M|2011-08-30|9999-12-31
DMF|1234567|98765432|M|2011-04-30|9999-12-31
DMF|1234567|98765432|M|2011-04-30|9999-12-31

Suppose i have multiple records like this. I only want to display records that have minimum eff_dt and maximum cancel date.

I only want to display just This 1 record

DMF|1234567|98765432|M|2011-04-30|9999-12-31

Thank you


Solution

  • Get min eff_dt and max canc_dt and use it to filter the relation.Assuming you have a relation A

    B = GROUP A ALL;
    X = FOREACH B GENERATE MIN(A.EFF_DT);
    Y = FOREACH B GENERATE MAX(A.CANC_DT);
    
    C = FILTER A BY ((EFF_DT == X.$0) AND (CANC_DT == Y.$0));
    D = DISTINCT C;
    DUMP D;