I have a table with columns country(chararray), population(int),zone(int). I need to find a country which has a greater population, where zone equal to 1. I need country name and population on the console.
I tried these statements after load.
fl = filter st by zone==1;
grp = group fl by zone;
result = foreach grp generate fl.country,MAX(fl.population);
dump result
It's giving me all the names and population. I can try 'Order by' and 'Limit', but I just need to use MAX function.
I tried to flatten operator, but it is asking me to try explicit cast. Can you please verify it.
Here I am including the sample data
country,population,zone
india 3000 1
Australia 4000 2
US 5000 1
China 3000 1
Russia 500 1
The same can be accomplished this way:
A = load 'data' using PigStorage(' ') as (c:chararray,p:int,z:int);
B = filter A by z==1;
C = foreach (group B all) {
ordered = order B by p DESC;
limited = limit ordered 1;
generate flatten(limited)
}
dump C;
The main advantage of this approach over MAX is you can easily tweak it to give you the 'top K' (just replace the parameter of the limit statement). Also, I think it uses less map-reduce jobs - filtering is done in the mapper, all the rest is done in the reducer. Using MAX+Filtering after it requires two jobs.