Using Hadoop's PIG-Latin to find the number of occurrences of unique search strings from a search engine log file.(click here to view the sample log file) Please help me out. Thanks in advance.
Pig script
excitelog = load '/user/hadoop/input/excite-small.log' using PigStorage() AS
(encryptcode:chararray, numericid:int, searchstring:chararray);
GroupBySearchString = GROUP excitelog by searchstring;
searchStrFrq = foreach GroupBySearchString Generate group as searchstring,count(searchstring);
Error encountered
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve count using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
You need to do:
searchStrFrq = foreach GroupBySearchString Generate group as searchstring,
COUNT(excitelog) as kount;
This is because the way grouping works in pig, GroupBySearchString
would be a bag of {group, excitelog}
, where excitelog
is itself a bag of all tuples matching the group. COUNT
is a UDF takes a bag as input and returns the number of tuples in the bag. So, COUNT(excitelog)
will then give you the number of tuples matching the group
.