Assume I have a text file name count.txt which contains below mentioned paragraph
I am working in hadoop along with various courses like Hadoop, Hana, Java etc
I love working with hadoop
This is hadoop project
Now I need to get How many times the word hadoop occured in the above file
The following code is what I have tried
c1= load '/...../count.txt' using PigStorage(',') as (Name:chararray);
c2 = foreach c1 generate FLATTEN(TOKENIZE(LOWER(Name)))as (Name1:chararray);
dump c2;
c3 = filter c2 by Name1=='hadoop';
dump c3;
here output I am getting as
(hadoop)
(hadoop)
(hadoop)
(hadoop)
What I need is the numeral 4,not the word hadoop repeated 4 times. hence i tried to execute
`c4 = foreach c3 generate COUNT($0);`
and getting error..Kindly do help me,may be a simple thing which I am unable to find. Thanks in Advance.
Try this:
Just do a group of c2:
c3 = filter c2 by Name1=='hadoop'
grouped = GROUP c3 BY Name1;
wordcount = FOREACH grouped GENERATE $0, COUNT($1);
DUMP wordcount
Let me know if it helps.