Search code examples
sumapache-pig

sum in apache pig: error 1045


I want to compute a division of 2 sums using pig

A = LOAD 's3://input' AS (filed1:chararray, filed2:int, field3:float, field4:float);
filtered_1 = FILTER A BY field3 >= 10;
filtered_2 = FILTER filtered_1  BY field4 >= 50;
grouped = GROUP filtered_2 BY field1;
B = FOREACH grouped GENERATE group as field1, SUM(A.field3)/SUM(A.field4) AS A_avg;

except I have this error while running last command:

ERROR grunt.Grunt: ERROR 1045: <line 5, column 55> Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.

And I cannot find why since I use GROUP before performing my sum, and I have been through the SUM doc and I don't know what differs with what I wrote


Solution

  • grouped = GROUP filtered_2 BY field1;
    
    grouped has no ACCESS to alias A . 
    
    B = FOREACH grouped GENERATE group as field1, SUM(A.field3)/SUM(A.field4) AS A_avg;
    
    "FOREACH grouped " has no access to alias A but directly to fields (field3, field4)
    
    
    
    
    
    
    filtered_1 = FILTER A BY field3 >= 10;
    filtered_2 = FILTER filtered_1  BY field4 >= 50;
    
    
    All you are doing in this statement is an AND operation 
    
    filtered_3 = FILTER A BY field3 >= 10 AND field4 >= 50;
    
    Now
    grouped = GROUP filtered_3 BY field1;
    B = FOREACH grouped GENERATE group as field1, SUM(filtered_3.field3)/SUM(filtered_3.field4) AS A_avg;