Search code examples
hadoopapache-pig

Pig throws error for simple Group by and count occurrence task


Using Hadoop's PIG-Latin to find the number of occurrences of unique search strings from a search engine log file.(click here to view the sample log file) Please help me out. Thanks in advance.

Pig script

excitelog = load '/user/hadoop/input/excite-small.log' using PigStorage() AS
(encryptcode:chararray, numericid:int, searchstring:chararray);                                        

GroupBySearchString = GROUP excitelog by searchstring;    

searchStrFrq = foreach GroupBySearchString Generate group as searchstring,count(searchstring);

Error encountered

 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve count using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]

Solution

  • You need to do:

    searchStrFrq = foreach GroupBySearchString Generate group as searchstring,
                                                    COUNT(excitelog) as kount;
    

    This is because the way grouping works in pig, GroupBySearchString would be a bag of {group, excitelog}, where excitelog is itself a bag of all tuples matching the group. COUNT is a UDF takes a bag as input and returns the number of tuples in the bag. So, COUNT(excitelog) will then give you the number of tuples matching the group.