i am reading an apache log from pig and it counts the total connections from ip's.
A = LOAD 'access.log' using PigStorage(' ') as (f0:chararray,f1:chararray,f2:chararray,f3:chararray,f4:chararray,f5:chararray,f6:chararray);
grp_f5 = GROUP A by f5;
counts = FOREACH grp_f5 GENERATE group, COUNT(A);
store counts into '/data/accesslog' using PigStorage(',');
result:
2.50.3.29,71
71.5.94.4,30
12.0.19.50,6
12.53.17.3,4
155.69.4.4,37
166.77.6.8,12
218.0.7.30,1956
5.10.83.28,1
5.86.82.80,177
50.18.2.73,1
59.10.5.53,377
however the data is not sorted by count, any idea?
If you do not sort the data explicitly, it will not be sorted. Sorting can be done with ORDER BY:
counts = FOREACH grp_f5 GENERATE group, COUNT(A) AS cnt;
counts_ordered = ORDER counts BY cnt DESC;