hadoop apache-pig bigdata hadoop-streaming

How to group by key and value using Pig

I am using pig and this is part of the text I want to analyse:

SciTePress: 32    
Springer: 10    
Springer: 13    
Springer: 14    
Springer: 1571

What I am trying to achieve is to sort the text in an ascendant way. For instance, I want the output to look like this:

Springer: 1608  //( i.e. the sum of 10+13+14+1571)
SciTePress: 32

is there a way to achieve this using pig?

This is the output I am getting now:

Springer: 1571
SciTePress: 32  
Springer: 14  
Springer: 13    
Springer: 10

These are the commands I have used:

    WORDS = LOAD '../filename' using PigStorage(':') AS (title: chararray, count:int);
    grpd = GROUP WORDS BY count;
    sorted = order WORDS by count desc;
    top5 = limit sorted 5;
    dump top5;

Solution

We have to group the data based on title and for each group we can call SUM function to get the sum.

Input :

SciTePress: 32    
Springer: 10    
Springer: 13    
Springer: 14    
Springer: 1571

Pig Script :

words = LOAD '/Users/muralirao/learning/pig/a.csv'  USING PigStorage(':') AS (title: chararray, title_count:int);
grp_by_title = GROUP  words BY title;
req_data = FOREACH grp_by_title GENERATE group AS title, SUM(words.title_count) AS total_count;
req_data_ordered = ORDER req_data BY total_count;

Output : DUMP req_data_ordered

(SciTePress,32)
(Springer,1608)