Search code examples
hadoopapache-pig

Using bicond to get ID?


I have this snippet of table called joinTbl:

PRODUCT_ID  PRODUCT_NAME    ORDER_ID     PRODUCT_ID      CUSTOMER_ID     SALESPERSON_ID      UNIT_PRICE 
   11          CAKE           10946          11               83              1                  31
   11          CAKE           10949          11               10              2                  31
   11          CAKE           11020          11               56              2                  31
   14          CHICKEN        11076          14               9               4                  23.25
   11          CAKE           11077          11               65              1                  31
   14          CHICKEN        11077          14               65              1                  23.25

In Pig Apache, I am trying to get the ORDER_ID if the order consist of both cake and chicken. The expected result is

11077

However I am facing issue trying to do a bicond to get this ORDER_ID. This is the syntax I used:

cakeChicken = FOREACH joinedTbl GENERATE ((PRODUCT_NAME == 'CAKE' AND PRODUCT_NAME == 'CHICKEN') ? ORDER_ID : 0) AS order_both;

The return from this is just 0 which is the else statement.

What am I doing wrong?


Solution

  • Your bincond is working correctly - as it's within FOREACH, it will check each row of data individually. Therefore, each row will only have one value for PRODUCT_NAME, so it cannot both be 'CAKE' and 'CHICKEN'.

    Based on what you want to do, I would use GROUP BY to group on ORDER_ID and then filter the PRODUCT_NAME bag to just those that contain 'CAKE' and 'CHICKEN' using a nested foreach. Finally, filter the data to "not empty" bags. Something like this:

    groupedData = GROUP  joinTbl BY ORDER_ID;
    
    /* Structure:
    ---------------------------------------------------------------------------------------------------------------------------------
    | groupedData | group | joinTbl: bag({PRODUCT_ID, PRODUCT_NAME, ORDER_ID, PRODUCT_ID, CUSTOMER_ID, SALESPERSON_ID, UNIT_PRICE}) |
    ---------------------------------------------------------------------------------------------------------------------------------
    |             | 10946 | {(11, CAKE, 10946, 11, 83, 1, 31)}                                                                      |
    |             | 11077 | {(11, CAKE, 11077, 11, 65, 1, 31), (14, CHICKEN,  11077, 14, 65, 1, 23.25)}                             |
    ---------------------------------------------------------------------------------------------------------------------------------
    */
    
    cakeChickenIds = FOREACH groupedData {
        cakes = FILTER joinTbl BY PRODUCT_NAME == 'CAKE';
        chickens = FILTER joinTbl BY PRODUCT_NAME == 'CHICKEN';
        GENERATE group AS ORDER_ID,
        cakes,
        chickens;
    }
    
    /* Structure:
    ------------------------------------------------------------------------------------------
    | cakeChickenIds | ORDER_ID | cakes: bag({PRODUCT_NAME}) | chickens: bag({PRODUCT_NAME}) |
    ------------------------------------------------------------------------------------------
    |                | 10946    | {(CAKE)}                   | {()}                          |
    |                | 11077    | {(CAKE)}                   | {(CHICKEN)}                   |
    ------------------------------------------------------------------------------------------
    */
    
    -- Both cakes and chickens bags will not be empty if ordered both
    cakeChickenOrders = FILTER cakeChickenIds BY NOT IsEmpty(cakes) AND NOT IsEmpty(chickens);