I have a simple code in pig, I want to extract the number of films for every year, I loaded the content of the file in movies and I typed this code:
groupingyear = group movies by year;
vrar = foreach groupingyear generate movies.year, COUNT(movies.year);
The result is fine! But I want to get the (year,number of films) structure and not this structure: (why the years are written many times?)
You are counting the years.Assuming you have a field movie_name in your movies dataset.
groupingyear = group movies by year;
vrar = foreach groupingyear generate group, COUNT(movies.movie_name);