Updated
The input is a json line text file.
{"store":"079","items":[{"name":"早晨全餐","unit_price":18,"quantity":1,"total":18},{"name":"麦趣鸡盒","unit_price":78,"quantity":5,"total":390},{"name":"巨无霸","unit_price":17,"quantity":5,"total":85},{"name":"香骨鸡腿","unit_price":12,"quantity":2,"total":24},{"name":"小薯条","unit_price":7,"quantity":5,"total":35}],"date":"\/Date(1483256820000)\/","oId":"27841ef9-f88e-478f-8f20-17c3ad090ebc"}
{"store":"041","items":[{"name":"小薯条","unit_price":7,"quantity":2,"total":14},{"name":"巨无霸","unit_price":17,"quantity":4,"total":68}],"date":"\/Date(1483221780000)\/","oId":"afee2e6d-0f81-4780-82e9-2169bf3c43f3"}
{"store":"008","items":[{"name":"奶昔香草","unit_price":9,"quantity":5,"total":45},{"name":"小薯条","unit_price":7,"quantity":2,"total":14}],"date":"\/Date(1483248600000)\/","oId":"802ea077-1eef-4cc9-af89-af7398e56792"}
Expect to group by all store and calculate the sum of total in each items,for example:
store_name total_amount
_________________________
001 2212.26
002 3245.46
003 888888
My Pig script:
store_table = LOAD '/example/store-data/2017-store-sales-data.json'
USING JsonLoader('
store_name:chararray,
items: {(
name:chararray,
unit_price:Bigdecimal,
quantity:int,
total:Bigdecimal)
},
date:Datetime,
oId:chararray'
);
platten_table = foreach store_table generate flatten(items), store_name;
store_group = group platten_table by store_name;
result = foreach store_group {
total_sum = sum(platten_table.items::total);
Generate group,total_sum;
}
Pig error is : 2017-11-28 08:53:54,357 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input 'Generate' expecting SEMI_COLON
Eval Functions are case sensitive, you need to use the eval function SUM in upper case.
Code snippet -
result = foreach store_group {
total_sum = SUM(platten_table.items::total);
Generate group,total_sum;
}