Search code examples
hadoopapache-pig

Maximum value of a column in apache pig


I am trying to find the maximum value of a column ratingTime using pig.I am running below script :

    ratings = LOAD '/user/maria_dev/ml-100k/u.data' AS (userid:int,movieID:int,rating:int, ratingTime:int);
    maxrating = MAX(ratings.ratingTime);
    DUMP maxrating

Sample Input data is :

    196 242 3   881250949
    186 302 3   891717742
    22  377 1   878887116
    244 51  2   880606923

I am getting below error :

     2018-08-05 07:02:05,247 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook 

     2018-08-05 07:02:05,914 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. <file script.pi    

Solution

  • You need a preceding GROUP ALL before applying MAX.Source

    ratings = LOAD '/user/maria_dev/ml-100k/u.data' USING PigStorage('\t') AS (userid:int,movieID:int,rating:int, ratingTime:int);
    rating_group = GROUP ratings  ALL;
    maxrating = FOREACH ratings_group GENERATE MAX(ratings.ratingTime);
    DUMP maxrating;