I am using pig and this is part of the text I want to analyse:
SciTePress: 32
Springer: 10
Springer: 13
Springer: 14
Springer: 1571
What I am trying to achieve is to sort the text in an ascendant way. For instance, I want the output to look like this:
Springer: 1608 //( i.e. the sum of 10+13+14+1571)
SciTePress: 32
is there a way to achieve this using pig?
This is the output I am getting now:
Springer: 1571
SciTePress: 32
Springer: 14
Springer: 13
Springer: 10
These are the commands I have used:
WORDS = LOAD '../filename' using PigStorage(':') AS (title: chararray, count:int);
grpd = GROUP WORDS BY count;
sorted = order WORDS by count desc;
top5 = limit sorted 5;
dump top5;
We have to group the data based on title and for each group we can call SUM function to get the sum.
Input :
SciTePress: 32
Springer: 10
Springer: 13
Springer: 14
Springer: 1571
Pig Script :
words = LOAD '/Users/muralirao/learning/pig/a.csv' USING PigStorage(':') AS (title: chararray, title_count:int);
grp_by_title = GROUP words BY title;
req_data = FOREACH grp_by_title GENERATE group AS title, SUM(words.title_count) AS total_count;
req_data_ordered = ORDER req_data BY total_count;
Output : DUMP req_data_ordered
(SciTePress,32)
(Springer,1608)