Dataset - Contains PostId and userID
20 1
21 2
45 3
85 1
48 1
98 1
74 1
96 2
63 2
33 3
44 3
55 3
66 3
77 3
I want to access the userID with maximum no. of post
PIG code
A = load '/home/cloudera/Desktop/post.txt' as (postid:chararray, userid:chararray);
B = load '/home/cloudera/Desktop/user.txt' as (name:chararray, id:chararray);
C = group A by userid;
D = foreach C generate group,COUNT(A.postid) as count;
E = order D by count DESC;
F = limit D 1;
It gives output -
(3,6)
Now what should be the PIG statement to access username from user.txt whose id is same as A.userid after execution of F statement?
Add another statement to get the first column from relation F
G = FOREACH F GENERATE $0;
DUMP G;