Search code examples
apache-pig

PIG: using conditions


I have the following data set for a movie database:

Ratings: UserID, MovieID, Rating :: Movies: MovieID, Title :: Users: UserID, Gender, Age

Now Ive joined ratings and users. The goal is to determine each movieID rating by genders F and M together. Also to include movies which have atleast 20 ratings by F and M.

data = JOIN myuser BY user, myrating BY user;
grouped_users = GROUP data BY (movie,gender);

Now after the grouped_users, I need to filter out movie which has less than 20 ratings both genders. How can I do that?

grouped_users_twenty = FILTER grouped_users BY SIZE(grouped_users)>=20;

This is my logic. Getting errors.


Solution

  • You have to use COUNT and not SIZE.

    grouped_users_twenty = FOREACH grouped_users GENERATE group,COUNT(rating) as rating_count;
    final = FILTER grouped_users_twenty BY rating_count >= 20;