I want to compute a division of 2 sums using pig
A = LOAD 's3://input' AS (filed1:chararray, filed2:int, field3:float, field4:float);
filtered_1 = FILTER A BY field3 >= 10;
filtered_2 = FILTER filtered_1 BY field4 >= 50;
grouped = GROUP filtered_2 BY field1;
B = FOREACH grouped GENERATE group as field1, SUM(A.field3)/SUM(A.field4) AS A_avg;
except I have this error while running last command:
ERROR grunt.Grunt: ERROR 1045: <line 5, column 55> Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
And I cannot find why since I use GROUP before performing my sum, and I have been through the SUM doc and I don't know what differs with what I wrote
grouped = GROUP filtered_2 BY field1;
grouped has no ACCESS to alias A .
B = FOREACH grouped GENERATE group as field1, SUM(A.field3)/SUM(A.field4) AS A_avg;
"FOREACH grouped " has no access to alias A but directly to fields (field3, field4)
filtered_1 = FILTER A BY field3 >= 10;
filtered_2 = FILTER filtered_1 BY field4 >= 50;
All you are doing in this statement is an AND operation
filtered_3 = FILTER A BY field3 >= 10 AND field4 >= 50;
Now
grouped = GROUP filtered_3 BY field1;
B = FOREACH grouped GENERATE group as field1, SUM(filtered_3.field3)/SUM(filtered_3.field4) AS A_avg;