Search code examples
hadoopmapreduceapache-pig

apache pig count sort


i am reading an apache log from pig and it counts the total connections from ip's.

A = LOAD 'access.log' using PigStorage(' ') as (f0:chararray,f1:chararray,f2:chararray,f3:chararray,f4:chararray,f5:chararray,f6:chararray);
grp_f5 = GROUP A by f5; 
counts = FOREACH grp_f5 GENERATE group, COUNT(A);
store counts into '/data/accesslog' using PigStorage(','); 

result:

2.50.3.29,71
71.5.94.4,30
12.0.19.50,6
12.53.17.3,4
155.69.4.4,37
166.77.6.8,12
218.0.7.30,1956
5.10.83.28,1
5.86.82.80,177
50.18.2.73,1
59.10.5.53,377

however the data is not sorted by count, any idea?


Solution

  • If you do not sort the data explicitly, it will not be sorted. Sorting can be done with ORDER BY:

    counts = FOREACH grp_f5 GENERATE group, COUNT(A) AS cnt;
    counts_ordered = ORDER counts BY cnt DESC;