Search code examples
filtergroup-byapache-pig

Count/Sum in Apache Pig


I am a beginner with Apache Pig. There is a table with the following fields:

table - amount:long date:string country:string

Initially, my aim - to get the count of the field amount per country, monthwise. For example, this would be my required end result:

(Exhibit A)
201201 USA 100
201201 UK 150
201305 ITALY 200
201305 USA 120
201305 UK 20
201403 ITALY 300

The numbers 100,150,200,300 represent the count of amount, for each date, across all the countries.To achieve this, I wrote the following pig script. It achieves the above intended result.

data = ORDER table BY date ASC;

data1 = GROUP data BY (date, country);

countof_amount = FOREACH data1 GENERATE
             FLATTEN(group) AS (date, country),
             COUNT(data) AS amount_count;

countof_amount1 = order countof_amount by date ASC;

Now, I want to find the sum of all the counts of amount for every date across all countries e.g. from Exhibit A, I would like the following results:

201201 250
201305 240
201403 300

How should I go about doing this?

Thanks in advance!


Solution

  • Add last three lines it will work. I tested it locally and it works fine.

    table = LOAD 'input.txt' using PigStorage(' ') as(amount:long,date:chararray,country:chararray);  
    data = ORDER table BY date ASC;  
    data1 = GROUP data BY (date,country);  
    countof_amount = FOREACH data1 GENERATE 
                FLATTEN(group) AS (date, country),  
               COUNT(data.amount) AS (amount_count);  
    countof_amount1 = order countof_amount by date ASC;  
    
    mycount =  group countof_amount1 by date;  
    getFinalCount = FOREACH mycount  GENERATE group as date,SUM(countof_amount1.amount_count) as total;  
    dump getFinalCount;