Search code examples
hadoophiveapache-pig

Word count in PIG


Assume I have a text file name count.txt which contains below mentioned paragraph

    I am working  in hadoop along with  various courses like Hadoop, Hana, Java etc
    I love working with hadoop
    This is hadoop project 

Now I need to get How many times the word hadoop occured in the above file

The following code is what I have tried

    c1= load '/...../count.txt' using PigStorage(',') as (Name:chararray);
    c2 = foreach c1  generate FLATTEN(TOKENIZE(LOWER(Name)))as (Name1:chararray);
    dump c2;
    c3 = filter c2 by Name1=='hadoop';
    dump c3;

here output I am getting as

(hadoop)
(hadoop)
(hadoop)
(hadoop)

What I need is the numeral 4,not the word hadoop repeated 4 times. hence i tried to execute

`c4 = foreach c3 generate COUNT($0);`

and getting error..Kindly do help me,may be a simple thing which I am unable to find. Thanks in Advance.


Solution

  • Try this:

    Just do a group of c2:

    c3 = filter c2 by Name1=='hadoop'
    grouped = GROUP c3 BY Name1;
    wordcount = FOREACH grouped GENERATE $0, COUNT($1);
    DUMP wordcount
    

    Let me know if it helps.