I have this snippet of table called joinTbl
:
PRODUCT_ID PRODUCT_NAME ORDER_ID PRODUCT_ID CUSTOMER_ID SALESPERSON_ID UNIT_PRICE
11 CAKE 10946 11 83 1 31
11 CAKE 10949 11 10 2 31
11 CAKE 11020 11 56 2 31
14 CHICKEN 11076 14 9 4 23.25
11 CAKE 11077 11 65 1 31
14 CHICKEN 11077 14 65 1 23.25
In Pig Apache, I am trying to get the ORDER_ID
if the order consist of both cake and chicken. The expected result is
11077
However I am facing issue trying to do a bicond to get this ORDER_ID
. This is the syntax I used:
cakeChicken = FOREACH joinedTbl GENERATE ((PRODUCT_NAME == 'CAKE' AND PRODUCT_NAME == 'CHICKEN') ? ORDER_ID : 0) AS order_both;
The return from this is just 0
which is the else statement.
What am I doing wrong?
Your bincond is working correctly - as it's within FOREACH
, it will check each row of data individually. Therefore, each row will only have one value for PRODUCT_NAME
, so it cannot both be 'CAKE'
and 'CHICKEN'
.
Based on what you want to do, I would use GROUP BY
to group on ORDER_ID
and then filter the PRODUCT_NAME
bag to just those that contain 'CAKE'
and 'CHICKEN'
using a nested foreach. Finally, filter the data to "not empty" bags. Something like this:
groupedData = GROUP joinTbl BY ORDER_ID;
/* Structure:
---------------------------------------------------------------------------------------------------------------------------------
| groupedData | group | joinTbl: bag({PRODUCT_ID, PRODUCT_NAME, ORDER_ID, PRODUCT_ID, CUSTOMER_ID, SALESPERSON_ID, UNIT_PRICE}) |
---------------------------------------------------------------------------------------------------------------------------------
| | 10946 | {(11, CAKE, 10946, 11, 83, 1, 31)} |
| | 11077 | {(11, CAKE, 11077, 11, 65, 1, 31), (14, CHICKEN, 11077, 14, 65, 1, 23.25)} |
---------------------------------------------------------------------------------------------------------------------------------
*/
cakeChickenIds = FOREACH groupedData {
cakes = FILTER joinTbl BY PRODUCT_NAME == 'CAKE';
chickens = FILTER joinTbl BY PRODUCT_NAME == 'CHICKEN';
GENERATE group AS ORDER_ID,
cakes,
chickens;
}
/* Structure:
------------------------------------------------------------------------------------------
| cakeChickenIds | ORDER_ID | cakes: bag({PRODUCT_NAME}) | chickens: bag({PRODUCT_NAME}) |
------------------------------------------------------------------------------------------
| | 10946 | {(CAKE)} | {()} |
| | 11077 | {(CAKE)} | {(CHICKEN)} |
------------------------------------------------------------------------------------------
*/
-- Both cakes and chickens bags will not be empty if ordered both
cakeChickenOrders = FILTER cakeChickenIds BY NOT IsEmpty(cakes) AND NOT IsEmpty(chickens);