Search code examples
sumapache-pigdivide

PIG: sum and division, creating an object


I am writing a pig program that loads a file that separates its entires with tabs

ex: name TAB year TAB count TAB...

file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);

-- Group by type
grouped = GROUP file BY type;

-- Flatten
by_type = FOREACH grouped GENERATE FLATTEN(group) AS (type, year, match_count, volume_count);

group_operat = FOREACH by_type GENERATE  
        SUM(match_count) AS sum_m,
        SUM(volume_count) AS sum_v,
       (float)sum_m/sm_v;

DUMP group_operat;

The issue lies in the group operations object I am trying to create. I'm wanting to sum all the match counts, sum all the volume counts and divide the match counts by volume counts.

What am I doing wrong in my arithmetic operations/object creation? An error I receive is line 7, column 11> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "type:NULL,year:NULL,match_count:NULL,volume_count:NULL", right is "group:chararray"

Thank you.


Solution

  • Try like this, this will return type and sum.

    UPDATED the working code

    input.txt

    A       2001     10      2
    A       2002     20      3
    B       2003     30      4
    B       2004     40      1
    

    PigScript:

    file = LOAD 'input.txt' USING PigStorage() AS (type: chararray, year: chararray,
    match_count: float, volume_count: float);
    grouped = GROUP file BY type;
    group_operat = FOREACH grouped {
                                     sum_m = SUM(file.match_count);
                                     sum_v = SUM(file.volume_count);
                                     GENERATE group,(float)(sum_m/sum_v) as sum_mv;
                                    }
    DUMP group_operat;
    

    Output:

    (A,6.0)
    (B,14.0)