Search code examples
hadoopapache-pig

Calculate Average using PIG


I am new to PIG and want to calculate Average of my one column data that looks like

0
10.1
20.1
30
40
50
60
70
80.1

I wrote this pig script

dividends = load 'myfile.txt' as (A);
dump dividends
grouped   = group dividends by A;
avg       = foreach grouped generate AVG(grouped.A);
dump avg

It parses data as

(0)
(10.1)
(20.1)
(30)
(40)
(50)
(60)
(70)
(80.1)

but gives this error for average

2013-03-04 15:10:58,289 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: 
<file try.pig, line 4, column 41> Invalid scalar projection: grouped
Details at logfile: /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362438645642.log

ANY IDEA


Solution

  • The AVG built in function takes a bag as an input. In your group statement, you are currently grouping elements by the value of A, but what you really want to do is group all the elements into one bag.

    Pig's GROUP ALL is what you want to use:

    dividends = load 'myfile.txt' as (A);
    dump dividends
    grouped   = group dividends all;
    avg       = foreach grouped generate AVG(dividends.A);
    dump avg