Search code examples
hadoophivehiveql

hiveQL query select classID, studName from table1 where grade = Max(grade) group by classID, studName;


Sorry for writing an obvious bad query in subject but it accurately describes the result I am looking for.

I have a table with classID int, studName string, grade int. I need a result that lists each classID, studName that achieved the highest grade in each course. multiple students can achieve the grade and each should be listed in descending order first by classID then by studName.

sample output: | classID | studName| |---------|---------| | 101 | Mary | | 101 |Nate | | 101 |Chris | | 102 |Benjamin | | 103 |Nate | |103 |Tom | etc...

my first guess is:

SELECT classID, studName from table1 where grade = MAX(grade) group by classID, studName;

but this gives an error: ... not yet supported place for UDAF 'max'

I have also tried creating a view:

CREATE VIEW newView as select classID, MAX(grade) from table1 group by classID;

and using it in a subquery in the where statement:

select classID, studName from table1 where grade IN (select * from newView) group by classID, studName;

but it seems that : "SubQuery can contain only 1 item in SELECT List"

I have poured over "Apache Hive Essentials" book with no luck either.

I am new to HiveQL and this one is keeping me up at night. Any help would be GREATLY appreciated.

Thanks


Solution

  • Issue with your SQL -
    You need to name the max column and then use it in subquery IN clause.
    change the view like below.

    CREATE VIEW newView as select classID, MAX(grade) as maxgrade from table1 group by classID;
    

    Change your SQL like this

    select classID, studName from table1 where grade IN (select maxgrade from newView) group by classID, studName;
    

    Solution But this SQL will help you in achieving what you need 'find students who got max grade in each subject' by tweaking your SQL.

    select classID, studName, grade
    from table1 
    join (select classID mxid, MAX(grade) as maxgrade from table1 group by classID) mx -- this subquery will pick maximum grade in each class.
    ON grade =mx.maxgrade and classID = mx.mxid -- This join will select only the students with maximum grade in each class.